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Abstract 

Department  of  Defense  (DoD)  healtheare  is  one  of  the  largest  eontributors  to  the 
DoD  budget.  In  reeent  years,  the  eost  of  the  DoD  healtheare  system  has  risen  at  an 
exponential  rate.  Mueh  researeh  has  been  eondueted  on  the  impaets  that  eontinuity  of 
eare  has  on  both  improving  the  quality  of  patient  eare  and  on  redueing  healtheare  eosts  in 
the  private  seetor.  The  DoD  has  attempted  to  take  a  similar  approaeh  with  regards  to 
healtheare  eontinuity  as  a  means  to  reduee  healtheare  eosts.  This  researeh  investigates 
whether  eontinuity  of  eare  influenees  eosts  and  a  military  member’s  availability  to 
perform  duties.  Speeifieally,  this  researeh  examines  Air  Foree  fliers  with 
museuloskeletal  injuries.  Linear  and  logistie  regression  teehniques  are  utilized  to 
interpret  the  relationship  eontinuity  of  eare  has  on  both  patient  availability  and  eosts.  The 
study  does  not  identify  any  relationship  between  eontinuity  of  eare  with  eosts  and  patient 
availability.  These  findings  suggest  the  need  for  further  researeh  as  to  whether  these 
findings  regarding  eontinuity  of  eare  extend  beyond  museuloskeletal  injuries  within  the 
DoD  healtheare  system,  as  well  as  evaluating  other  potential  outeomes  for  eontinuity  of 
eare.  Researeh  should  also  be  eondueted  to  determine  other  faetors  influeneing  eosts  and 
patient  availability. 
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USING-DATA  MINING  TO  DETERMINE  THE  IMPACT  CONTINUITYOF 


CARE  HAS  ON  THE  AIR  FORCE’S  HEALTHCARE  SYSTEM 


I.  Introduction 


Background 

The  United  States  (U.S.)  has  seen  rapid  growth  in  healtheare  eosts;  this  rapid 
growth  poses  a  major  threat  to  the  eountry’s  eeonomie  seeurity  and  the  seeurity  of  its 
eitizens  (Schieber,  et  ah,  2009).  Though  healtheare  costs  in  the  U.S.  have  grown  faster 
than  other  similarly  advanced  and  developed  countries,  the  quality  has  not  grown  at  a 
comparable  rate.  The  U.S.  is  experiencing  lower  life  expectancy  and  higher  infant 
mortalities  than  other  countries  with  lower  healthcare  costs  (Farrell,  2008).  The  rate  at 
which  healthcare  costs  in  the  U.S.  are  growing  is  unsustainable  (Mitchell,  2013).  But 
why  is  the  cost  of  healthcare  within  the  U.S.  increasing  so  rapidly?  The  rapid  increase  in 
healthcare  cost  can  be  attributed  to  many  different  factors.  Technology  is  the  most 
common  factor  attributed  to  healthcare  cost  growth.  New  technology  is  estimated  to 
account  for  between  38  and  65  percent  of  cost  growth  in  U.S.  healthcare  system 
(Schieber,  et  ah,  2009).  Administrative  costs  also  account  for  a  great  portion  of 
healthcare  cost  growth  as  well,  with  an  average  growth  rate  of  7  percent  between  1995 
and  2005  (Farrell,  2008).  Lastly,  price  insensitivity  of  patients  coupled  with  healthcare 
providers’  fear  of  malpractice  lawsuits  drive  the  providers  to  implement  the  most  costly 
treatment  options  rather  than  lower  cost  treatment  options  (Farrell,  2008).  Though  many 
other  factors  have  been  noted  for  contributing  to  healthcare  cost  growth,  these  are  some 
of  the  major  drivers  to  the  unsustainable  growth  of  healthcare  costs  within  the  U.S. 
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Though  all  of  these  factors  are  noted  as  having  adversely  affected  healthcare  in 
the  private  sector,  government  programs  are  not  exempt  from  some  of  these  same  issues. 
Specifically,  healthcare  costs  in  the  Department  of  Defense  (DoD)  are  also  on  the  rise  at  a 
rapid  pace.  Some  of  the  factors  affecting  DoD  healthcare  costs  include  expanded  benefits 
and  increased  usage  of  healthcare  benefits  by  eligible  beneficiaries.  DoD  healthcare 
accounts  for  nearly  one  tenth  of  the  total  DoD  budget  (Harrison,  2010).  In  an 
environment  of  economic  conservatism,  finding  ways  to  decrease  costs  for  government 
programs  is  highly  desirable,  particularly  DoD  healthcare.  In  accordance  with  the  Pareto 
Principle,  it  is  assumed  that  20%  of  patients  within  the  healthcare  system  consume  80% 
of  the  resources  contributing  to  the  higher  majority  of  healthcare  cost  (Weinberg,  2009). 
Identifying  the  hypothesized  high  cost  group  within  the  DoD  that  accounts  for  a  majority 
of  costs  and  determining  trending  characteristics  of  this  population  could  be  beneficial  in 
forecasting  ways  of  preventing  common  healthcare  issues  and  can  ultimately  reduce 
costs.  One  potential  solution  to  reduce  healthcare  costs,  while  improving  quality,  is  to 
increase  continuity  of  care.  Evidence  suggests  that  there  is  an  association  between  higher 
continuity  of  care  and  lower  healthcare  costs  (Kristjansson,  et  ah,  2013;  Mainous  &  Gill, 
1998). 

Continuity  of  care  is  the  continual  process  of  care  by  the  same  healthcare  provider 
and  its  patients  over  time.  Over  time  the  healthcare  provider  can  establish  rapport  with 
the  patient,  identify  patient  trends,  minimize  repeat  diagnostic  testing,  and  provide  more 
effective,  higher  quality  care.  Substantial  literature  has  been  developed  on  healthcare 
continuity  in  the  private  sector.  The  DoD  implements  continuity  of  care  by  requiring 
compliance  with  the  standards  and  guidelines  of  the  Patient  Centered  Medical  Home 
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(PCMH)  model  requiring  the  eontinuity  of  medieal  reeord  information  at  all  times  and 
monitoring  the  pereentage  of  patient  visits  with  a  seleeted  elinieian  or  team  (PCMH, 
2014).  Unfortunately,  unique  attributes  in  the  DoD,  sueh  as  frequent  deployments  and 
reloeations,  make  healtheare  eontinuity  more  diffieult  than  that  of  the  private  seetor. 

Problem  Statement 

The  711*  Human  Performanee  Wing  (HPW)  at  Wright  Patterson  Air  Foree  Base 
in  Ohio  has  vested  interest  in  data  analysis  that  ean  identify  ways  to  reduee  healtheare 
eosts  within  the  Air  Foree.  The  eurrent  Air  Foree  healtheare  model  speeifies  that  patients 
should  meet  with  their  primary  eare  manger  (PCM),  also  known  as  primary  eare  provider, 
for  at  least  90%  of  their  appointments  and  should  meet  with  a  member  of  their  PCM  team 
for  at  least  70%  of  their  appointments.  Although  the  Air  Foree  healtheare  model  aeeounts 
for  eontinuity  of  eare,  no  empirieal  analysis  and  evidenee  exists  that  validates  its  benefits. 
This  researeh  seeks  to  fill  this  gap  by  evaluating  the  impaet  eontinuity  of  eare  has  on 
healtheare  eosts  and  the  readiness  of  Air  Foree  personnel.  To  effeetively  eonduet  this 
analysis,  this  researeh  limits  its  evaluation  to  aetive  duty  fliers  with  museuloskeletal 
injuries  (MSIs)  due  to  the  type  of  data  available.  For  more  diseussion  on  the  seleetion  of 
this  subpopulation  see  the  Seetion  Defining  Cost  Groups. 

Research  Objectives 

Extensive  review  of  the  literature  on  healtheare  analysis  brings  to  light  a  gap 
within  data  analysis  praetiees  used  within  the  private  seetor’s  healtheare  system  and  the 
Air  Foree ’s  healtheare  system.  The  purpose  of  this  thesis  is  to  bridge  that  gap  by  using 
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similar  data  analysis  techniques  from  the  private  sector,  and  implementing  them  in  the 
Air  Force  healthcare  system  by  tailoring  to  the  unique  characteristics  of  the  Air  Force. 

The  most  applicable  data  mining  techniques  are  used  to  analyze  the  data  provided 
by  the  711*  HPW;  the  analysis  identifies  the  portion  of  the  active  duty  fliers  with  MSIs 
within  the  Air  Force  that  accounts  for  the  highest  percentage  of  healthcare  costs.  The 
analysis  also  identifies  which  characteristics  and  diagnoses  are  predictive  of  costs  across 
both  low  and  high  cost  groups  and  how  continuity  of  care  impacts  healthcare  costs  and 
patient  availability. 

Investigative  Questions 

A  series  of  investigative  questions  were  developed  to  guide  the  research.  The 
subpopulation  referred  to  below  consist  of  the  active  duty  fliers  with  MSIs  selected  for 
evaluation  by  this  analysis. 

1)  What  percentage  of  the  subpopulation  contributes  to  a  majority  of  the  healthcare 
costs? 

2)  What  are  the  defining  characteristics  of  the  high  cost  group? 

a.  Which  personal  characteristics  (gender,  age  group,  race  group,  fitness 
information)  correlate  to  higher  healthcare  costs? 

b.  Which  organizational  factors  (military  rank  and  career  field)  account  for 
higher  healthcare  costs? 

3)  How  does  continuity  of  care  impact  healthcare  costs?  Does  the  impact  differ  for 
high  vs.  low  cost  populations? 
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4)  How  does  continuity  of  care  impact  patient  availability?  Does  the  impact  differ 
for  high  vs.  low  cost  populations? 

To  test  the  hypothesis  of  the  Pareto  Principle,  this  analysis  begins  with 
determining  whether  or  not  there  is  a  small  portion  of  the  subpopulation  that  contributes 
to  the  majority  of  healthcare  costs.  Once  this  is  established,  the  high  cost  and  low  cost 
groups  are  analyzed  separately  for  comparison  to  determine  which  personal  and 
organization  characteristics  are  prominent  in  each  group  and  if  there  is  evidence  that 
certain  characteristics  are  predictive  of  costs.  The  research  also  investigates  the  impact 
continuity  of  care  has  on  healthcare  costs  and  patient  availability.  Answering  these 
questions  provides  beneficial  insight  into  the  current  Air  Force  healthcare  model  and  how 
to  better  implement  continuity  of  care. 

Methodology 

Due  to  the  wide  and  successful  use  of  data  mining  techniques  in  the  healthcare 
industry,  these  methods  are  used  to  analyze  the  Air  Force’s  continuity  of  care  healthcare 
data  for  fliers  with  MSIs.  Specifically  multivariate  linear  regression,  logistic  regression, 
and  simple  linear  regression  are  used.  The  results  of  the  data  mining  analysis  reveal  the 
specific  common  characteristics  of  the  high  cost  group  within  the  Air  Force.  These 
characteristics  provide  insight  into  the  specific  demographic  and  organizational  factors 
that  correlate  to  higher  healthcare  costs  of  Air  Force  personnel.  The  results  also  provide 
insight  into  the  Air  Force’s  current  continuity  of  care  model  and  demonstrate  whether 
increased  continuity  of  care  is  correlated  with  decreased  costs  and  increased  patient 
availability  for  fliers  with  MSIs. 
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Assumptions  and  Limitations 

In  order  to  perform  analysis,  eertain  assumptions  and  limitations  are  made 
regarding  the  data. 

Assumptions 

•  Patient  appointment  eosts  only  inelude  eosts  ineurred  for  serviees  rendered; 
these  eosts  do  not  inelude  fixed  eosts. 

•  Assume  the  data  to  be  aeeurate 

•  Assume  the  data  to  be  eomplete 
Limitations 

•  Unable  to  obtain  data  on  duty  loeation,  deployment  information,  and  the 
aireraft  the  patient  is  assigned  to 

•  For  privaey  purposes,  data  is  limited  to: 

•  Age  groups;  aetual  age  is  not  ineluded  for  privaey  purposes 

•  Rank  groups  as  opposed  to  speeifie  military  rank  title 

•  Appointment  year;  aetual  dates  not  of  eaeh  appointment  are  not 
ineluded 

Data  Scoping  and  Handling 

It  is  neeessary  to  seope  down  the  problem  in  order  to  ereate  a  more  manageable 
dataset.  To  do  this,  assumptions  are  made  for  this  researeh.  First,  the  population  is 
redueed  to  aetive  duty  Air  Foree  fliers  with  museuloskeletal  injuries  (MSIs).  The  dataset 
is  seoped  down  to  Air  Foree  fliers  beeause  the  Air  Foree  healtheare  system  aeeurately 
traeks  patient  availability  through  pilots’  flying  status  eodes.  For  non-fliers,  “profiles” 


6 


are  set  for  those  members  who  are  unavailable  for  duty.  Profiles  are  commonly 
unreliable  in  determining  a  patient’s  actual  availability.  Profiles  frequently  expire  before 
a  patient  has  fully  recovered,  or  are  not  updated  in  the  system  when  a  patient  recovers 
earlier  than  anticipated.  Using  flying  status  codes  for  fliers  allows  a  more  accurate 
depiction  of  a  patient’s  availability.  MSIs  are  considered  because  they  provide  a  wide 
range  of  costs  due  to  the  considerable  flexibility  in  diagnoses,  diagnostic  methods,  and 
treatments.  Second,  healthcare  costs  will  include  only  costs  incurred  for  services 
rendered;  they  will  not  include  administrative  or  overhead  costs,  given  these  are  costs  not 
specific  or  influential  to  a  patient’s  quality  or  continuity  of  care. 

This  thesis  utilizes  centralized  medical  databases  maintained  by  the  Air  Forces 
Surgeon  General  (AF/SG6).  All  data  are  stripped  of  personal  identifiers  before  analysis 
is  performed.  The  data  are  housed  on  existing  computers  in  the  Human  Systems 
Laboratory  at  the  Air  Force  Institute  of  Technology  at  Wright  Patterson  Air  Force  base  in 
Ohio.  These  computers  require  Common  Access  Card  enabled  access  granted  to 
government  employees  and  contractors,  with  the  data  stored  in  limited  permissions 
directories. 

Preview 

This  chapter  provides  the  motivation  and  importance  for  a  need  for  further 
research  of  the  Air  Force’s  healthcare  system.  Chapter  II  gives  a  background  on  the 
literature  that  exists  on  data  mining  within  private  sector  healthcare,  military  healthcare 
applications  of  data  mining,  and  continuity  of  care  within  the  private  sector.  Chapter  III 
gives  an  overview  of  the  methods  and  processes  used  to  perform  the  analysis  and  answer 
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the  investigative  questions.  Chapter  IV  presents  the  results  of  the  analysis  and  how  it  is 
interpreted.  Chapter  V  provides  the  key  conclusions  to  be  drawn  from  the  research  and 
offers  recommendations  on  future  research  on  the  topic  of  healthcare  within  the  Air 
Force. 
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II.  Literature  Review 


Chapter  Overview 

This  chapter  examines  the  background  and  literature  of  data  mining  teehniques  in 
healthcare  and  the  impaets  of  implementing  healthcare  eontinuity.  Data  mining  has  been 
evolving  as  a  more  robust  way  to  analyze  large  datasets.  With  the  development  of 
eleetronic  healthcare  reeords,  data  mining  is  essential  to  progression  and  advaneement 
within  the  medical  community.  The  use  of  data  mining  in  healtheare  ean  provide  insights 
into  better  treatment  regimens  and  earlier  detection  and  prediction  of  chronic  illnesses. 
This  chapter  will  review  the  literature  by  exploring  the  implementation  of  data  mining 
teehniques  in  different  areas  of  the  healthcare  eommunity.  With  the  DoD  having  the 
most  robust  healthcare  reeords  system  in  the  country  (Dolfini-Reed  &  Jebo,  2000),  this 
chapter  will  also  review  the  applications  of  data  mining  within  the  DoD  healthcare 
system. 

It  is  hypothesized  that  continuity  of  care  in  a  healtheare  system  deereases  a 
patient’s  likelihood  of  future  hospitalization  and  increases  the  quality  of  eare  experienced 
by  the  patient  (Mainous  &  Gill,  1998).  This  chapter  reviews  the  literature  that  exists  on 
continuity  of  care  and  the  impacts  continuity  of  care  has  on  patient  quality  and  healthcare 
costs. 

Data  Mining 

With  increased  technology,  data  is  being  colleeted  and  stored  at  a  rapid  pace. 

Data  mining  assists  in  managing  and  analyzing  large  datasets  (Fayyad,  Piatesky-Shapiro, 
&  Smyth,  1996).  Data  mining,  commonly  referred  to  as  the  knowledge  discovery  of 
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databases  (KDD),  is  a  comprehensive  term  that  describes  a  combination  of  statistical  and 
computer  science  techniques  to  discover  relationships  and  patterns  within  large  databases 
(Srinivas,  Kavihta,  &  Govrdhan,  2010).  Medical  databases  have  been  increasing  in  size 
making  traditional  data  analysis  methods  much  more  difficult.  Data  mining  has  evolved 
from  these  traditional  analysis  methods  to  create  algorithms  to  extract  patterns  from  data. 
There  are  a  variety  of  data  mining  methods  utilized  across  a  variety  of  applications 
including  marketing,  investments,  fraud  detection,  manufacturing,  and  healthcare 
(Fayyad,  Piatesky-Shapiro,  &  Smyth,  1996). 

Data  Mining  within  Healthcare 

Given  the  size  of  medical  records  and  information,  data  mining  is  an  essential  tool 
to  healthcare  reform  and  the  efficiency  of  medical  processes.  The  conversion  to 
electronic  medical  records  over  the  years  has  created  the  ability  to  gather  more  healthcare 
data  (Prather,  et  ah,  1997).  With  the  dramatic  growth  in  the  size  of  medical  databases, 
manual  data  analysis  is  impractical  (Fayyad,  Piatesky-Shapiro,  &  Smyth,  1996).  Because 
of  this,  data  mining  has  become  more  popular  and  critical  within  the  healthcare 
community. 

Multiple  data  mining  techniques  are  being  utilized  within  the  healthcare 
community,  including  factor  analysis  (Fayyad,  Piatesky-Shapiro,  &  Smyth,  1996), 
multivariate  analysis  (Gilmer,  et  ah,  2005;  Reid,  et  ah,  2009)  univariate  and  multivariate 
logistic  regression  (Lv,  et  ah,  2011;  Kurth,  Glynn,  Gaziano,  Berger,  &  Robins,  2006),  and 
multivariate  time  series  algorithms  (Wong,  2004).  Extensive  research  exists  in  a  variety 
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of  different  areas  of  healtheare  and  data  mining  from  deteeting  disease  outbreaks  to  the 
implementation  of  patient-centered  medical  home  model. 

Benefits  of  Healthcare  Data  Mining 

Data  mining  is  used  in  healthcare  to  improve  effectiveness  of  treatments, 
healthcare  management,  and  healthcare  quality  (Koh  &  Tan,  2011).  Effectiveness  of 
treatment  is  a  measure  of  the  effectiveness  of  the  actions  taken  to  move  a  patient  from  an 
unhealthy  state  to  a  healthy  state.  These  actions  incorporate  a  wide  range  of  treatment 
options  including  pharmaceutical  prescriptions,  laboratory  procedures,  and  simple  doctor 
visits.  There  are  several  ways  in  which  data  mining  has  been  used  to  measure  how 
effective  a  treatment  is  for  an  illness.  Kincade  (1998)  analyzes  how  effective  and  cost 
efficient  specific  drug  regimens  were  for  patients  of  the  same  condition.  Srinivas  (2010) 
utilizes  decision  tree  analysis  to  predict  the  potential  for  a  patient  to  experience  a  heart 
attack  based  on  patient  characteristics.  Data  mining  is  useful  in  finding  root  causes  for 
more  effective  treatment. 

Healthcare  management  is  the  ability  to  better  track  chronic  illness  and  manage 
the  illnesses  appropriately;  successful  healthcare  management  is  known  to  reduce 
hospital  admissions  and  claims  (Koh  &  Tan,  2011).  Data  mining  has  been  used  to 
mitigate  issues  of  resource  usage,  management  of  hospital  resources,  and  predict 
inpatient  length  of  stay  (Sharma  &  Mansotra,  2014).  Kincade  (1998)  does  this  by 
categorizing  patients  according  to  demographic  and  medical  conditions  to  help  determine 
high  cost  populations  based  on  resource  utilization  and  frequency  of  visits.  Data  mining 
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can  help  identify  areas  of  risk  and  improvement  and  provide  valuable  information  to 
make  the  healtheare  management  proeess  more  effeetive. 

Quality  of  eare  is  the  patient’s  satisfaetion  with  the  serviees  provided  as  well  as 
the  short  and  long-term  impaets  of  these  serviees.  Sehuerenberg  (2003)  utilizes  deeision 
tree  analysis  to  improve  the  quality  of  healthcare  from  treatment,  disease  management, 
and  eost  management.  Brannigan  (1999)  implements  a  study  that  uses  data  mining  as  a 
tool  to  regulate  patient  wait  times  and  improve  serviee  to  patients.  Data  mining  has  many 
uses  to  help  improve  quality  of  eare  to  patients. 


Continuity  of  Care 

It  is  hypothesized  that  eontinuity  of  eare  in  a  healtheare  system  deereases  a 
patient’s  likelihood  of  future  hospitalization  (Mainous  &  Gill,  1998),  ultimately 
decreasing  healtheare  eosts.  Generally,  a  patient’s  primary  eare  provider  is  the  first  point 
of  eontaet  in  the  healtheare  system  (Balasubramanian,  Banerjee,  Denton,  Naessens,  & 
Stahl,  2010).  Thus,  in  a  long-term  physieian-patient  relationship,  a  knowledge  based  is 
acerued  (Mainous  &  Gill,  1998).  Primary  eare  providers  are  responsible  for  preventive 
medieine,  patient  edueation,  routine  physieal  exams,  and  referring  patients  to  medieal 
speeialties  for  speeialized  eare  (Balasubramanian,  Banerjee,  Denton,  Naessens,  &  Stahl, 
2010).  It  is  believed  that  physieian  and  patient  eontinuity  is  fundamental  to  good  primary 
healtheare  and  is  effeetive  in  redueing  healthcare  eost  (Weiss  &  Blustein,  1996). 

Literature  suggests  additional  benefits  of  implementing  continuity  in  a  healthcare 
system  include  deereases  in  the  number  of  appointments  a  patient  will  need,  the  number 
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of  laboratory  tests  needed,  and  the  overall  number  of  emergeney  room  visits  (Weiss 
1996).  A  primary  care  provider  that  has  a  relationship  with  the  patient  may  perform  more 
cost  effectively  with  respect  to  their  diagnosis  (De  Maeseneer,  De  Prins,  Gosset,  & 
Heyerick,  2003);  managing  the  number  of  appointments  is  crucial  to  improving  quality 
and  managing  costs  (Green,  Savin,  &  Murray,  2007);  the  number  of  appointments  needed 
can  be  minimized  through  increased  continuity  (Tantau,  2009). 

Reports  by  the  Institute  for  Healthcare  Improvement  show  that  40%  of  emergency 
department  cases  occurred  because  patients  could  not  see  their  primary  care  provider 
(Balasubramanian,  Banerjee,  Denton,  Naessens,  &  Stahl,  2010).  Patients  who  meet 
regularly  with  their  primary  care  providers  are  generally  more  satisfied  with  the  care 
provided,  more  likely  to  take  medications  properly,  more  likely  to  be  properly  diagnosed, 
and  less  likely  to  be  hospitalized  (Balasubramanian,  Banerjee,  Denton,  Naessens,  & 

Stahl,  2010).  Studies  show  that  continuity  of  care  is  effective  in  lowering  emergency 
room  use,  hospitalization,  and  reducing  the  number  of  no-shows  for  appointments 
(Kristjansson,  et  al.,  2013).  Quality  of  care  and  patient  satisfaction  is  also  shown  to 
increase  with  continuity  (Bjorkelund,  et  al.,  2013). 

Much  research  exists  that  shows  the  impact  of  patients  meeting  with  their  primary 
care  providers  and  its  impact  on  costs.  In  a  survey  analysis  by  Weiss  &  Blustein  (1996), 
the  results  show  that  patients  with  high  provider  continuity  (10+  years)  experienced 
substantially  lower  costs  of  care.  This  cost  association  was  also  seen  in  a  study  observing 
the  Belgian  healthcare  system  over  a  two  year  period  which  showed  that  patients  who 
visited  the  same  family  physician  had  lower  total  costs  for  medical  care  (De  Maeseneer, 
De  Prins,  Gosset,  &  Heyerick,  2003).  It  is  also  important  not  to  discount  the  research  that 
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shows  the  impact  continuity  of  care  has  on  the  quality  of  care.  As  noted  previously, 
increased  quality  of  care  can  result  in  reduced  appointments,  ultimately  reducing 
healthcare  cost.  Mainous  &  Gill  (1998)  find  that  high  continuity  of  care  decreased  the 
likelihood  of  future  hospitalization.  Anderson  et  al  (2012)  found  that  medical  continuity 
was  more  common  among  older  patients,  and  higher  continuity  resulted  in  a  lower 
probability  of  needing  emergency  care  and  lower  total  medical  costs.  These  studies  show 
that  increasing  continuity  will  in  time  increase  healthcare  quality,  ultimately  reducing 
healthcare  costs. 

While  many  studies  investigate  continuity  of  care  in  the  private  sector,  limited 
research  exists  on  continuity  of  care  within  the  military  healthcare  system.  Given  the 
unique  nature  of  the  military  healthcare  system  with  the  frequent  movement  and 
deployment  of  its  healthcare  providers  and  members,  the  impacts  of  continuity  of  care  are 
expected  differ  in  the  military  healthcare  system  compared  to  that  of  the  private 
healthcare  system.  Additionally,  while  extensive  research  exists  that  investigates  costs 
and  factors  that  influence  costs,  limited  research  explores  factors  that  influence  patient 
availability.  For  the  military  healthcare  system,  it  is  important  that  patients  have  rapid 
recoveries  in  order  to  be  ready  for  duty.  This  also  differentiates  the  military  healthcare 
system  from  the  private  healthcare  system. 

Data  Mining  Military  Applications 

Data  mining  is  used  in  the  DoD  in  multiple  areas  including  incidence  ratio 
analysis  to  determine  the  frequency  of  incidences  of  cancer  within  the  US  Air  Force 
active  duty  population  (Yamane,  2006),  correlation  analysis  of  military  personnel  to  link 
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illness  among  Gulf  War  veterans  (Bose  &  Mahapatra,  2001),  and  scoring  model  to 
determine  if  patients  with  diabetes  would  require  readmission  (Ramachandran, 

Erraguntla,  Mayer,  &  Benjamin,  2007).  While  military  data  mining  applications  are 
varied,  limited  research  has  focused  specifically  on  cost,  patient  availability,  or  continuity 
of  care  within  the  military  healthcare  system.  This  research  begins  to  close  that  gap 
through  an  analysis  of  these  factors  for  Air  Force  active  duty  pilots  with  musculoskeletal 
injuries. 

Conclusion 

The  purpose  of  this  literature  review  is  to  provide  the  background  on  the  literature 
that  exists  on  data  mining  within  healthcare.  The  chapter  defines  and  provides  an 
overview  of  data  mining.  Next,  this  chapter  discusses  data  mining  and  how  it  has  been 
used  in  the  healthcare  field  in  the  private  sector  and  its  benefits.  Then,  the  chapter 
examines  the  literature  on  continuity  of  care  and  its  benefits  in  the  private  sector.  Last, 
this  chapter  explores  data  mining  applications  in  the  military. 
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III.  Methodology 


Chapter  Overview 

The  purpose  of  this  ehapter  is  to  deseribe  the  methods  used  to  analyze  the  factors 
that  contribute  to  healthcare  cost  and  patient  availability.  The  chapter  first  examines  the 
process  of  data  gathering,  collection,  and  formatting  to  prepare  data  for  analysis.  Next,  it 
explores  the  problem  formulation  needed  to  effectively  answer  the  investigative 
questions.  Then,  the  three  phases  of  the  analysis  process  are  explained  along  with  the 
details  of  each  step  of  the  analysis  process  and  the  investigative  questions  that  are  being 
answered  at  each  step.  Lastly,  the  chapter  summarizes  the  information  covered. 

Scoping  of  Data 

Healthcare  data  are  obtained  from  the  Air  Force’s  CarePoint  site.  The 
subpopulation  chosen  for  this  study  is  active  duty  fliers  whose  diagnose  results  in  them 
being  off  of  flying  status  due  to  musculoskeletal  injuries  (MSIs).  Active  duty  fliers  are 
chosen  because  of  the  accurate  records  kept  on  whether  a  patient  is  available  for  duty  via 
their  flying  status;  this  allows  a  more  accurate  way  of  tracking  patient  availability  than  is 
possible  for  non-flying  military  personnel. 

MSIs  are  chosen  as  the  diagnosis  of  choice  based  on  their  frequency  amongst 
fliers  due  to  the  strenuous  activity  associated  with  flying  (Tvaryanas,  2014).  In  addition, 
MSIs  provide  a  variety  of  different  diagnoses  types  from  less  sever  diagnoses  such  as 
back  pain  and  joint  pain,  to  more  sever  diagnoses  such  as  bone  disease  and  injuries  of  the 
spine.  MSIs  also  provide  a  wide  range  of  tools  and  procedures  used  to  diagnosis  and 
treat  them  (Tvaryanas,  2014).  Since  MSI  diagnoses,  diagnostics,  and  treatments  are  so 
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diverse,  this  set  of  conditions  allows  the  results  of  the  continuity  of  care  analysis  to  be 
more  generalizable  to  other  non-MSI  diagnoses. 


Problem  Formulation 

The  two  dependent  variables  considered  are  healthcare  cost  and  patient 
availability.  Healthcare  costs  are  the  costs  associated  with  providing  care;  these  are  costs 
associated  with  medical  procedures,  pharmaceutical  prescriptions,  and  laboratory  tests. 
Total  costs  are  considered  for  each  patient  appointment  over  a  five  year  period  (July 
2009-June  2014).  Patient  availability  is  calculated  as  the  length  of  time  a  patient  is  off  of 
flying  status  cumulatively  from  2009.  The  independent  variables  considered  are  the 
personal  and  organizational  factors  listed  in  Table  1. 


Table  1:  Independent  Variables 


Gender 

Career  Field 

Male 

1.  Pilot 

Female 

2.  Combat  Systems  Officer 

Age 

3.  Aicrew 

Ages  19-29 

4.  Command  and  Control 

Ages  30-39 

5.  Aircrew  Protection 

Ages  40-49 

6.  Flight  Nurse 

Ages  50+ 

7.  Aerospace  Medicine  Specialist 

Race  Group 

8.  Aerospace  Medical  Service 

Asian  or  Pacific  Islander 

9.  Air  Battle  Manager  /  Special 

Black,  not  Hispanic 

Tactics  /  Combat  Rescue  /  Space 

Hispanic 

Officers 

Other/Unknown 

White,  not  Hispanic 

Fitness  Information 

Military  Rank  /  Level  of  Experience 

Height 

Junior  Enlisted 

Weight 

Senior  Enlisted 

Physical  Fitness  Test  Run  Score 

Junior  Officer 

Physical  Fitness  Test  Score 

Senior  Officer 

Abdominal  Circumference 
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Personal  factors  are  those  unique  to  the  patient  that  the  Air  Force  cannot  control. 

The  personal  factors  considered  in  this  study  are: 

•  Gender  -  measured  as  a  binary  variable,  with  “1”  for  male  and  “0”  for  female 

•  Age  group-  This  consists  of  4  dummy  variables  listed  in  Table  1.  Each  dummy 
variable  is  measured  using  a  data  field  for  each  age  group;  measured  with  binary 
variable  “1”  if  the  patient  is  in  age  group  and  “0”  if  the  patient  is  not 

•  Race  group  -  This  consists  of  5  dummy  variables  listed  in  Table  1.  Each  dummy 
variable  is  measured  using  a  data  field  for  each  race  group;  measured  with  binary 
variable  “1”  if  the  patient  is  in  race  group  and  “0”  if  the  patient  is  not 

•  Eitness  information-  Includes  height,  weight,  and  abdominal  circumference 
information.  Test  run  score  and  physical  fitness  test  score  are  measured  on  a  0  to 
100  scale;  100  being  the  best  score. 

•  Military  rank  -  This  consists  of  4  dummy  variables  listed  in  Table  1 .  Each 
dummy  variable  is  measured  using  a  data  field  for  each  military  rank  group; 
measured  with  binary  variable  “1”  if  the  patient  is  in  military  rank  group  and  “0” 
if  the  patient  is  not.  The  ranks  included  in  each  rank  group  are  listed  below: 

•  Junior  Enlisted:  Airman  Basic,  Airman,  Airman  Eirst  Class,  Senior 
Airman 

•  Senior  Enlisted:  Staff  Sergeant,  Technical  Sergeant,  Master  Sergeant, 
Senior  Master  Sergeant,  Chief  Master  Sergeant 

•  Junior  Officer:  Second  Eieutenant,  Eirst  Eieutenant,  Captain 

•  Senior  Officer:  Major,  Eieutenant  Colonel,  Colonel,  General  Officers 
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•  Career  field  -  This  eonsists  of  9  dummy  variables  listed  in  Table  1 .  Eaeh  dummy 

variable  is  measured  using  a  data  field  for  eaeh  eareer  field;  measured  with 

binary  variable  “1”  if  the  patient  is  in  the  eareer  field  and  “0”  if  the  patient  is  not 

Investigative  Questions 

1)  What  pereentage  of  the  population  eontributes  to  a  majority  of  the  healthcare 
costs? 

2)  What  are  the  defining  characteristics  of  the  high  cost  group? 

a.  Which  personal  characteristics  (gender,  age  group,  race  group,  fitness 
information)  correlate  to  higher  healthcare  costs? 

b.  Which  organizational  factors  (military  rank  and  career  field)  account 
for  higher  healthcare  costs? 

3)  How  does  continuity  of  care  impact  healthcare  costs?  Does  the  impact  differ 
for  high  vs.  low  cost  populations? 

4)  How  does  continuity  of  care  impact  patient  availability?  Does  the  impact 
differently  for  high  vs.  low  cost  populations? 

First,  it  is  important  to  begin  the  analysis  identifying  the  high  cost  group  and  the 
defining  characteristics  of  both  low  and  high  cost  groups.  This  helps  target  specific 
groups  in  which  improvements  to  healthcare  costs  could  be  most  effective.  Next, 
understanding  the  impact  continuity  of  care  has  on  patient  appointment  cost  is  important 
to  help  manage  rising  healthcare  costs  in  the  Air  Force.  Fastly,  patient  availability  is 
essential  in  the  Air  Force’s  healthcare  system  because  Air  Force  members  need  to  be 
ready  to  deploy  and  support  the  Air  Force’s  mission.  Finderstanding  how  continuity  of 
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care  impacts  patient  availability  is  important  to  establish  effective  polieies  and 
requirements  on  eontinuity  of  care.  Answering  these  investigative  questions  will  provide 
benehcial  insight  into  the  current  Air  Force  healthcare  system  and  its  effeetiveness. 


Methodology  Phases 

To  organize  the  researeh  process,  the  method  is  divided  into  three  separate  phases. 
These  three  phases  answer  the  speeifie  investigative  questions  where  the  appropriate 
analysis  is  required. 

1 .  Data  Coheetion 

2.  Defining  Cost  Groups 

3.  Regression  Analysis 

Data  Coheetion 

Data  are  eohected  from  multiple  sourees:  The  Aviation  Safety  Information 
Management  System  (ASIMS),  the  Air  Foree  Military  Personnel  Database  (mil_pers),  the 
Air  Force  Fitness  Management  System  (AFFMS),  the  Cardiac  Risk  Management 
database  (CRAM),  and  the  Comprehensive  Ambulatory/Professional  Encounter  Reeord 
(CAPER)  database.  This  study  has  an  approved  Institutional  Review  Board  (IRB)  review 
and  Health  Insurance  Portability  and  Aecountability  Aet  (HIPAA)  waiver  (Appendix  A: 
IRB  Approval  Tetters).  A  data  broker  removed  personally  identifiable  information  from 
the  data  prior  to  this  analysis.  The  data  obtained  from  the  above  databases  is  detailed 
below: 
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ASIMS:  Contains  information  regarding  duty,  mobility,  and  flying  status  (patient 
availability). 

Mil_pers:  Contains  personal  and  organizational  faetors  that  include  the  career 
field,  gender,  military  rank,  and  age  group  data  fields. 

AFFMS:  Contains  data  reflecting  results  of  patients  bi-annual  physical  fitness 
assessment;  the  assessment  measures  cardiac  ability  through  a  1.5  mile  run, 
number  of  push-ups  and  sit  ups  completed  in  one  minute,  body  mass  index  (BMI), 
height,  and  weight. 

CAPER:  Contains  detailed  information  regarding  patient  medical  appointments. 
This  database  gives  appointment  costs  information  that  include  procedural, 
pharmaceutical,  and  laboratorial.  It  also  includes  details  that  show  the  diagnosis 
type,  continuity  of  care  information,  and  the  year  in  which  the  patient  was  seen. 

The  data  analyzed  are  for  Air  Force  active  duty  fliers  who  are  off  of  flying  status 
due  to  an  MSI  diagnosis  as  of  July  of  2009;  these  data  cover  a  five  year  period  of  patient 
appointment  history  from  July  2009  to  June  2014.  Thus,  active  duty  fliers  with  MSIs  in 
July  of  2009  are  defined  as  the  subpopulation  for  which  analysis  is  conducted.  Upon 
collection  of  the  data,  it  is  important  to  format  the  data  to  get  it  in  a  form  usable  to  be 
analyzed  to  answer  the  investigative  questions.  The  data  are  formatted  in  the  following 
manner: 

Data  Assumptions 

■  Continuity  of  care  only  exists  if  a  patient  has  more  than  one  appointment; 
patients  with  one  appointment  are  removed  from  the  dataset. 
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■  Patient  availability  is  defined  as  number  of  days  a  patient  is  available  to 
fly;  sinee  data  is  limited  to  flying  status  year,  as  opposed  to  flying  status 
date,  availability  is  looked  at  eumulatively  starting  with  2009  and  will  end 
at  2012.  Patient  availability  is  not  ealculated  beyond  2012  because  all 
patients  had  returned  to  flying  status  or  had  separated  from  the  Air  Force 
beyond  2012. 

Data  Formatting 

■  Medical  appointments  without  at  least  one  MSI  diagnosis  were  removed. 

■  61%  of  patients  have  appointments  with  missing  provider  IDs;  to  account 
for  this,  analysis  is  performed  using  two  different  scenarios: 

■  Best  case  scenario:  All  blank  provider  ID  entries  appointments  are 
assumed  to  be  appointments  with  the  same  provider 

■  Worst  case  scenario:  All  blank  provider  ID  entries  are  assumed  to 
be  appointments  with  different  providers 

■  MSI  diagnoses  were  broken  into  four  types: 

■  Arthropathies  -  Diseases  of  the  joints  /  joint  inflammation 

■  Dorsopathies  -  Spinal  disease  /  injuries  of  the  back 

■  Rheumatism  -  Pain  associated  with  joints  and  connective  tissues 
(back  pain,  neck  pain  and  osteoarthritis) 

■  Osteopathies,  chondropathies,  and  acquired  musculoskeletal 
deformities  -  Diseases  associated  with  bones  or  cartilage 
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Defining  Cost  Groups 

Data  mining  is  vital  to  understand  the  issues  related  to  fliers  and  the  speeific 
organizational  and  personnel  faetors  that  eontribute  to  healtheare  eost.  To  begin  the 
analysis,  eertain  eost  groups  are  identified  by  identifying  the  top  pereentages  of  the 
highest  cost  patients  and  calculating  the  percentage  of  total  costs  these  patient’s  account 
for.  Identifying  the  different  cost  groups  allows  the  ability  to  analyze  the  data  in  smaller 
subsets  that  are  more  similar  to  rid  the  influence  of  results  by  more  dominant  groups.  To 
establish  the  cost  groups,  each  patient’s  total  appointment  costs  are  aggregated  over  the 
five  year  period.  Once  the  patient’s  costs  are  calculated,  patients  are  sorted  in  order  by 
their  total  appointment  costs.  Potential  cost  groups  are  identified  by  the  percentage 
contribution  to  total  costs;  the  cost  groups  considered  are  the  top  5%,  10%,  15%,  20%, 
25%,  and  30%.  The  break  out  that  comes  closest  to  the  80%  hypothesized  by  the  Pareto 
Principle  is  selected  as  the  high  cost  group.  The  results  of  this  cost  group  identification 
answers  question  1  regarding  identifying  the  percentage  of  the  population  that  contributes 
to  the  preponderance  of  the  healthcare  costs. 

Regression  Analysis 

Once  the  cost  groups  have  been  identified,  analysis  of  variance  (ANOVA)  tests 
are  performed  on  the  two  cost  groups  separately  to  test  for  attributes  that  are  predictive  of 
costs.  Minitab  (Version  15)  is  used  to  develop  the  initial  ANOVA  tables.  Next, 
multivariate  regression  is  used  to  quantify  the  impacts  that  personal  and  organizational 
factors  have  on  healthcare  costs.  For  the  multivariate  regression,  the  variables  gender, 
age,  race,  rank,  and  career  field  are  used  as  independent  variables  to  the  response 
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variable,  cost.  P-values  from  simple  linear  regressions  are  evaluated  as  a  screening 
experiment,  using  a  threshold  of  0.05  to  determine  if  a  variable  is  predictive  of  costs. 

The  characteristics  identified  with  p-values  less  than  0.05  are  included  in  the  multivariate 
regression  as  predictive  of  cost  within  that  given  cost  group.  Logistic  regression  is 
performed  to  determine  which  characteristics  are  predictive  in  determining  which  cost 
group  a  patient  belongs  to.  The  results  of  these  ANOVAs  and  regression  analyses  answer 
question  2  regarding  identifying  the  cost  groups  and  the  defining  characteristics  of  those 
cost  groups. 

Separate  simple  linear  regression  is  performed  on  the  independent  variable, 
continuity  of  care,  against  the  response  variables,  patient  appointment  cost  and  patient 
availability.  Continuity  of  care  is  defined  as  the  percentage  of  times  the  patient  meets 
with  their  designated  primary  care  manager  for  an  illness  whereas  patient  availability  is 
defined  as  the  number  of  days  a  patient  was  available  to  fly  cumulatively  since  2009.  P- 
values  for  each  simple  linear  regression  equation  are  evaluated;  cases  in  which  the  p- 
value  are  less  than  or  equal  to  0.05  are  considered  statistically  significant. 

The  simple  linear  regression  analysis  for  continuity  of  care  against  patient 
appointment  cost  is  tested  separately  for  each  cost  group,  each  specific  diagnoses  type, 
and  scenario  type.  The  cost  groups  will  consist  of  the  low  cost  group,  high  cost  group, 
and  all  patients  combined  into  a  single  group.  The  diagnosis  types  are  arthropathies, 
dorsopathies,  rheumatism,  osteopathies,  and  all  patient  diagnoses  to  include  both  MSI 
and  non-MSI  diagnoses  (MSI  patients  may  have  non-MSI  diagnoses  in  the  same 
appointment  as  an  MSI  diagnosis).  The  different  scenarios  are  the  best  case  (where  blank 
provider  IDs  are  considered  the  same  provider)  and  worst  case  scenarios  (where  blank 
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provider  IDs  are  eonsidered  to  be  different  providers).  Given  that  there  are  3  eost 
groups,  5  different  diagnosis  types,  and  2  different  scenarios,  a  total  of  30  simple  linear 
regression  graphs  and  equations  are  generated.  This  regression  analysis  answers  question 
3  regarding  whether  continuity  of  care  impacts  healthcare  cost. 

The  simple  linear  regression  analysis  for  continuity  of  care  against  patient 
availability  is  tested  separately  for  each  cost  group,  cumulative  calendar  year,  and 
scenario  type.  The  cost  groups  are  also  the  low  cost  group  and  high  cost  group.  The 
cumulative  calendar  years  are  2009,  2010,  2011,  and  2012.  The  scenario  types  are  the 
best  case  and  worst  case  scenarios.  Given  that  there  are  2  cost  groups,  4  cumulative 
years,  and  2  different  scenarios,  16  total  simple  linear  regression  graphs  and  equations  are 
generated  for  continuity  of  care  vs.  patient  availability.  This  regression  analysis  answers 
question  4  which  asks  whether  continuity  of  care  impacts  patient  availability. 

Conclusion 

The  purpose  of  this  chapter  is  to  provide  the  methodology  for  analyzing  the 
impact  continuity  of  care  has  on  fliers  with  MSIs  within  the  Air  Force’s  healthcare 
system.  The  methods  employed  are  multivariate  linear  regression,  simple  linear 
regression,  and  logistic  regression.  First,  the  characterization  of  the  patients  is 
determined  for  the  different  cost  groups.  Next  the  influences  continuity  of  care  has  on 
healthcare  cost  and  patient  availability  are  evaluated  and  compared  for  both  cost  groups. 
These  methods  are  sufficient  in  answering  questions  of  whether  continuity  of  care 
impacts  fliers  with  MSIs  within  the  Air  Force’s  healthcare  system. 
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IV.  Analysis  and  Results 


Chapter  Overview 

The  purpose  of  this  ehapter  is  to  present  the  results  of  the  regression  analysis 
completed  to  answer  the  investigative  questions  in  regards  to  the  impact  continuity  of 
care  has  on  fliers  with  musculoskeletal  injuries  (MSIs).  The  results  of  this  analysis  help 
provide  beneficial  insights  into  the  Air  Force’s  current  continuity  of  care  model 
implemented  in  its  healthcare  system. 

The  investigative  questions  are  divided  into  two  categories:  demographic 
characterization  and  continuity  of  care.  The  demographic  characterization  analysis  is 
performed  to  determine  which  proportion  of  the  population  is  high  cost  and  which 
proportion  is  low  cost.  Multivariable  regression  is  performed  to  determine  if  there  are 
defining  characteristics  that  make  up  each  group;  logistic  regression  is  performed  to 
evaluate  if  it  can  be  determined  which  group  a  patient  belongs  to  based  upon  known 
characteristics.  For  the  continuity  of  care  analysis,  simple  linear  regression  is  performed 
to  determine  in  which  instances  continuity  of  care  has  influence  over  patient  appointment 
costs  and  patient  availability. 

Assumptions  and  Data  Formatting 

The  dataset  is  comprised  of  patient  appointment  and  characteristic  information 
from  July  2009  through  June  2014.  The  dataset  includes  all  patients  that  are  off  of  flying 
status  due  to  an  MSI  in  July  of  2009,  and  follows  their  medical  appointment  history 
through  June  2014.  To  have  the  data  in  its  clearest  and  most  accurate  representation, 
several  assumptions  are  made  and  data  formatting  is  performed. 
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Data  Assumptions 

■  Continuity  of  care  only  exists  if  a  patient  has  more  than  one  appointment; 
patients  with  one  appointment  are  removed  from  the  database. 

■  Patient  availability  is  defined  as  number  of  days  a  patient  is  available  to 
fly  in  a  given  year.  Beeause  data  is  limited  to  flying  status  year,  as 
opposed  to  flying  status  date,  availability  is  looked  at  cumulatively 
starting  with  2009  and  will  end  at  2012.  Patient  availability  is  not 
ealculated  beyond  2012  beeause  no  patients  from  the  July  2009  group  are 
off  of  flying  status  due  to  an  MSI  beyond  2012. 

Data  Formatting 

■  Patients  must  have  at  least  one  medieal  appointment  with  an  MSI 
diagnosis  to  be  ineluded. 

■  Numerous  patients  have  appointments  with  missing  provider  IDs;  to 
aeeount  for  this,  analysis  is  performed  using  two  different  seenarios: 

■  Best  ease  scenario:  All  blank  provider  ID  entries  appointments 
with  the  same  provider 

■  Worst  ease  scenario:  All  blank  provider  ID  entries  are  interpreted 
as  appointments  with  different  providers 

■  MSI  diagnoses  were  broken  into  four  types: 

■  Arthropathies  -  Diseases  of  the  joints  /  joint  inflammation 

■  Dorsopathies  -  Spinal  disease  /  injuries  of  the  baek 
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■  Rheumatism  -  Pain  associated  with  joints  and  connective  tissues 
(back  pain,  neck  pain  and  osteoarthritis) 

■  Osteopathies,  chondropathies,  and  acquired  musculoskeletal 
deformities  -  Diseases  associated  with  bones  or  cartilage 


Demographic  Characterization 
Cost  Profiles 

Patient  appointment  costs  are  summed  over  the  five  year  period  for  each  patient 
for  all  appointments  that  include  at  least  one  MSI  diagnosis,  yielding  a  total  cost  per 
patient.  Patients  are  sorted  in  order  by  their  patient  total  costs;  starting  with  the  highest 
cost  patients,  the  top  5%  -  30%  (in  increments  of  5%)  are  calculated  along  with  their 
associated  percentage  of  the  subpopulation  total  cost.  Table  2  contains  the  percentage  of 
the  subpopulation  and  their  associated  percentage  of  subpopulation  total  costs.  This 
break  out  is  used  to  identify  the  division  of  the  subpopulation  that  best  represents  the  80- 
20  split  hypothesized  by  the  Pareto  Principle.  The  top  30%  of  patients  that  make  up  70% 
of  the  subpopulation  total  costs  are  chosen  as  the  high  cost  group  while  the  bottom  70% 
of  patients  that  make  up  30%  of  subpopulation  total  costs  are  the  low  cost  group. 


Table  2:  Cost  Profile  Table 


Percentage  of  People 

Percentage  of  Costs 

5% 

27% 

10% 

40% 

15% 

50% 

20% 

58% 

25% 

65% 

30% 

70% 
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Organizational  and  Personal  Characteristics 


Table  3  displays  summary  characteristics  for  each  of  the  cost  groups.  The 
compositions  of  each  cost  group  in  terms  of  personal  and  organizational  characteristics 
are  relatively  similar  for  both  profiles.  The  largest  difference  to  note  is  the  age  category, 
which  has  a  higher  proportion  of  patients  ages  30-39  in  the  high  cost  group  and  a  higher 
proportion  of  patients  ages  40-49  in  the  low  cost  group. 


Table  3:  Characterization  Table 


Gender 

Low  Cost 

High  Cost 

Male 

85% 

Female 

15% 

Age 

Low  Cost 

High  Cost 

Ages  19-29 

15% 

11% 

Ages  30-39 

38% 

52% 

Ages  40-49 

43% 

33% 

Ages  50-h 

4% 

4% 

Race 

Low  Cost 

High  Cost 

Aslan  or  Pacific  Islander 

3% 

4% 

Black,  not  Hispanic 

5% 

6% 

Hispanic 

4% 

6% 

Other/Unknown 

5% 

4% 

White,  not  Hispanic 

82% 

81% 

Rank 

Low  Cost 

High  Cost 

Junior  Enlisted 

11% 

15% 

Senior  Enlisted 

29% 

33% 

Junior  Officer 

15% 

12% 

Senior  Officer 

44% 

39% 

Career  Field 

Low  Cost 

High  Cost 

Pilot 

36% 

32% 

Combat  Systems  Officer 

11% 

10% 

Air  Battle  Manager  /  Special  Tactics  / 

Combat  Rescue  /  Space  Officers 

6% 

4% 

Aicrew 

34% 

35% 

Command  and  Control 

6% 

9% 

Aircrew  Protection 

0% 

2% 

Flight  Nurse 

3% 

4% 

Aerospace  Medicine  Specialist 

3% 

2% 

Aerospace  and  Operational  Physiology 

0% 

0% 

Aerospace  Medical  Service 

1% 

3% 

For  each  cost  group,  analysis  of  variance  tests  are  run  for  each  categorical 
characteristic  against  the  response  variable  patient  appointment  costs;  analysis  of 
variance  (ANOVA)  is  used  to  determine  if  the  mean  cost  values  differ  for  each  level  of 
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the  characteristic.  If  the  mean  of  one  level  is  different  for  that  of  another,  then  the  values 
of  that  characteristic  could  be  predictive  of  patient  appointment  costs.  A  p-value 
threshold  of  0.05  is  used  for  statistical  significance.  Figure  1  lists  the  resulted  ANOVA 
tables  for  each  characteristic  for  the  low  cost  group.  All  p-values  for  each  ANOVA  table 
are  >  0.05,  therefore  fail  to  reject  the  null  hypothesis  that  the  mean  values  are  the  same 
for  the  levels  of  each  characteristic.  Thus,  there  are  no  characteristics  that  are  predictive 
of  patient  appointment  costs  in  the  low  cost  group.  Figure  2  shows  the  resulted  ANOVA 
tables  for  each  characteristic  for  the  high  cost  group.  All  p-values  for  each  ANOVA  table 
are  >  0.05  therefore  fail  to  reject  the  null  hypothesis  that  the  mean  values  are  the  same  per 
characteristic.  Thus  there  are  no  characteristics  that  are  predictive  of  patient  appointment 
costs  in  the  high  cost  or  low  cost  groups. 

Additionally,  it  is  important  to  note  the  large  confidence  intervals  for  the  under¬ 
represented  categories  within  each  factor.  For  example,  the  variance  in  confidence 
intervals  for  the  race  groups  excluding  White,  not  Hispanic  are  much  larger  than  that  of 
the  White,  not  Hispanic  race  group.  That  is,  there  is  a  large  difference  between  the 
sample  size  of  the  majority  categories  and  the  minority  categories.  Thus,  the  lack  of 
statistical  difference  between  the  means  for  these  ANOVAS  is  at  least  partially  due  to  the 
small  sample  sizes  for  some  categories.  There  may  actually  be  statistical  differences 
between  the  mean  costs  of  each  category,  but  it  would  be  essential  to  have  increased 
sample  sizes  for  the  minority  categories  to  validate  this. 
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One-way  ANOVA:  Appt  Costs  versus  Gender 


One-way  ANOVA:  Appt  Costs  versus  Race  Group 


One-way  ANOVA:  Appt  Costs  versus  Career  Field 


Source  DF  SS  MS  F  P 

Gender  1  S04656  504656  0.14  0.704 

Error  515  1799850376  3494855 

Total  516  1800355032 

S  =  1869  R-Sq  =  0.031  R-Sq(adj)  =  0.00% 


Source 

DF 

SS 

MS 

F  P 

Race  Group 

4 

29390561 

7347640 

2.12  0.077 

Error 

512 

1770964472 

3458915 

Total 

516 

1800355032 

S  =  1860 

R-Sq 

=  1.63%  R-Sq(adi)  * 

0.86% 

Source  DF  SS  MS  F  P 

Career  Field  7  30489420  4355631  1.25  0.272 

Error  509  1769865612  3477143 

Total  516  1800355032 

S  =  1865  R-Sq  =  1.69%  R-Sq(adj)  =  0.34% 


Individual  95%  CIs  For  Mean  Based  on 
Pooled  StDev 

Level  N  Mean  StDev  - + - + - H - + 

Female  50  2643  1787  ( - * - ) 

Male  467  2749  1878  ( - * - 1 


2400  2700  3000  3300 

Pooled  StDev  =  1869 


Level 

Asian  or  Pacific  Islande 
Black,  not  Hispanic 
Hispanic 
Other/Unlcnown 
^ite,  not  Hispanic 


N  Mean  StDev 

18  1995  1472 

26  2892  1852 

20  2902  1707 

28  1971  1666 

425  2803  1892 


Individual  95%  CIs  For  Mean  Based  on 


Pooled  StDev 

Level  - + - + - + - + - 

Asian  or  Pacific  Islande  ( - * - ) 

Black,  not  Hispanic  ( - * - ) 

Hispanic  < - * - ) 

Other/Dnknown  ( - * - ) 

Hhite,  not  Hispanic  (-* — ) 


1400  2100  2800  3500 


Pooled  StDev  =  1860 


One-way  ANOVA:  Appt  Costs  versus  Age  Group 

Source  DF  SS  MS  F  P 

Age  Group  3  6794820  2264940  0.65  0.585 

Error  513  1793560212  3496219 

Total  516  1800355032 

S  =  1870  R-Sq  =  0.38%  R-Sq(adD)  =  0.00% 


One-way  ANOVA:  Appt  Costs  versus  Rank  Group 

Source  DF  SS  MS  F  P 

Rank  Group  3  20070981  6690327  1.93  0.124 

Error  513  1780284051  3470339 

Total  516  1800355032 

S  =  1863  R-Sq  =  1.11%  R-Sq(adj|  =  0.54% 


Level 

Aerospace  Medical  Servic 
Aerospace  Medicine  Sped 
Aicrew 

Air  Battle  Manager  /  Spe 
^rcrew  Protection 
Combat  Systems  Officer 
Flight  Nurse 
Pilot 


N  Mean  StDev 

4  3450  1651 

18  2788  2059 

177  2870  1953 

32  3074  1962 

29  3161  1942 

56  2786  1697 

16  2868  1782 

185  2443  1786 


Individual  95%  CIs  For  Mean  Based  on 


Pooled  StDev 

Level  - + - + - + - H - 

Aerospace  Medical  Servic  ( - * - ) 

Aerospace  Medicine  Sped  ( - * - ) 

Aicrew  ( — *-) 

Air  Battle  Manager  /  Spe  ( - * - ) 

Aircrew  Protection  ( - * - ) 

Combat  Systems  Officer  ( - * - > 

Flight  Nurse  ( - * - ) 

Pilot  (-*--> 

- + - + - + - + - 

2000  3000  4000  5000 


Pooled  StDev  =  1865 


Level 

Ages  19-29 
Ages  30-39 
Ages  40-49 
Ages  50+ 


N  Mean 
77  2659 

195  2875 

222  2674 

23  2464 


StDev 

1814 

1888 

1906 

1484 


Individual  95%  CIs  For  Mean  Based  on 
Pooled  StDev 

- + - + - + - +_ 

^  *  I  Junior  Enlisted  58  2660 

( - * - j  Junior  Officer  79  2776 

( - * - j  Senior  Enlisted  152  3021 

( - * - )  Senior  Officer  228  2557 


2000  2400  2800  3200 


Individual  95%  CIs  For  Mean  Based  on 
Pooled  StDev 

StDev  - + - + - H - +- 

1765  ( - * - ) 

1785  ( - * - ) 

2003  ( - * - ) 

1816  ( - * - ) 


2400  2700  3000  3300 


Pooled  StDev  =  1870 


Pooled  StDev  =  1863 


Figure  1:  Low  Cost  Group  ANOVA  Tables 
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One-way  ANOVA:  Appt  Costs  versus  Gender 


One-way  ANOVA:  Appt  Costs  versus  Race  Group 


One-way  ANOVA:  Appt  Costs  versus  Career  Field 


Source  DF  SS  MS  F  P 

Gender  1  504656  504656  0.14  0.704 

Error  515  1799850376  3494855 

Total  516  1800355032 


Source 
Race  Group 
Error 
Total 


176684757  44171189  0.38  0.821 

24932243121  115427051 

25108927878 


Source  DF  SS 
Career  Field  8  678627816 
Error  212  24430300062 
Total  220  25108927878 


84828477 

115237264 


0.74  0.660 


S  =  1869  R-Sq  =  0.03%  R-Sq(adj>  =  0.00% 


S  =  10744  R-Sq  =  0.70%  R-Sq<ad3)  =  0.00% 


S  =  10735  R-Sq  =  2.70%  R-Sq<adJ)  =  0.00% 


Individual  95%  CIs  For  Mean  Based  on 
Pooled  StDev 

Level  N  Mean  StDev  - n - -i - + - + 

Female  50  2643  1787  { - * - ) 

Male  467  2749  1878  ( - * - ) 

- + - + - + - + 

2400  2700  3000  3300 

Pooled  StDev  =  1869 


One-way  ANOVA:  Appt  Costs  versus  Age  Group 


Source 
Aqe  Group 
Error 
Total 


3  451522854  150507618  1.32  0.267 

217  24657405024  113628595 
220  25108927878 


S  =  10660  R-Sq  =  1.80%  R-Sq(ad:>  =  0.44% 


Level 


Mean  StDev 


Ages  19-29  25  13917  8246 
Ages  30-39  115  14368  9390 
Ages  40-49  73  16744  12990 
Ages  50+  8  19651  10855 


Individual  95%  CIs  For  Mean  Based  on  Pooled  StDev 


Level 
Ages  19-29 
Ages  30-39 
Ages  40-49 
|Ages  50+ 


Level 

Asian  or  Pacific  Islande 
Black,  not  Hispanic 
Hispanic 
Other/Dnlmown 
White,  not  Hispanic 


Level 

Asian  or  Pacific  Islande 
Black,  not  Hispanic 
Hispanic 
Other/Onknown 
White,  not  Hispanic 


N  Mean  StDev 


8  12128  4128 

13  14735  7699 

14  13438  7411 

8  13849  5294 

178  15687  11447 


Level  N  Mean  StDev 

Aerospace  Medical  Servic  7  17378  12598 

Aerospace  Medicine  Speci  4  24478  13823 

Aicrew  78  14514  9119 

^ir  Battle  Manager  /  Spe  8  19118  18410 

Aircrew  Protection  19  14244  10102 

Individual  95%  CIs  For  Mean  Based  on  Pooled  StDev  Combat  Systems  Officer  22  16061  15261 

-+ - + - + - + -  Command  and  Control  4  12828  8181 

{ - . - ^  Flight  Nurse  9  12256  4336 

{ - * - ,  Pilot  70  15565  10109 


(— * - ) 

- + - +_. 

15000  20000 


Pooled  StDev  *  10744 

One-way  ANOVA:  Appt  Costs  versus  Rank  Group 


Source 
Rank  Group 
Error 
Total 


3  258195169  86065056  0.75 

217  24850732709  114519506 

220  25108927878 


S  =  10701  R-Sq  =  1.03%  R-Sq{adj)  =  0.00% 


Individual  95%  CIs  For  Mean  Based  on 
Pooled  StDev 


Level 

Junior  Enlisted 
Junior  Officer 
Senior  Enlisted 
Senior  Officer 


Mean 

12832 

16075 

15397 

15931 


StDev 

7644 

11269 

10055 

12000 


Individual  95%  CIs  For  Mean  Based  on 
Pooled  StDev 


Level 

Aerospace  Medical  Servic 
Aerospace  Medicine  Speci 
Aicrew 

Air  Battle  Manager  /  Spe 
Aircrew  Protection 
Combat  Systems  Officer 
Command  and  Control 
Flight  Nurse 
Pilot 


Pooled  StDev  =  10735 


(- 
(--*• 
(— • 
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Figure  2:  High  Cost  Group  ANOVA  Tables 
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An  additional  method,  multivariate  regression,  is  tested  to  determine  whieh 
characteristies  are  predietive  of  a  patient’s  appointment  eosts.  In  step-wise  form, 
characteristics  with  p-values  above  0.15  are  removed  until  the  characteristics  left  have  p- 
values  close  to  and  below  0.05.  Table  4  shows  the  results  of  the  final  multivariate 
regression  on  the  low  cost  group.  The  characteristics  that  remain  in  this  regression 
equation  are  Asian  or  Pacific  Islander  race  group,  Other/Unknown  race  group,  and  the 
pilot  career  field.  With  an  adjusted  r-squared  value  of  0.024,  this  model  is  not  very 
predictive  of  costs,  therefore  there  may  be  other  characteristics  not  included  in  this 
dataset  that  explain  the  variability  in  patient  appointment  costs  for  the  low  cost  group. 


Table  4:  Low  Cost  Group  Multivariate  Regression  Table 

The  regression  equation  is 

Patient_Appointment_Costs  =  2972  -  802*Asian  or  Pacific  isiander  -  824*Other/Uni<nown  -  450*Piiot 


Predictor 

Coef 

SE  Coef 

T 

P 

Constant 

2972.1 

104.1 

28.55 

0 

Asian  or  Pacific  Islander 

-802.1 

443.3 

-1.81 

0.071 

Other/Unknown 

-824.4 

359 

-2.3 

0.022 

Pilot 

-450.1 

169.4 

-2.66 

0.008 

S=  1845.53 _ R-Sq  =  2.9% _ R-Sq(adj)  =  2.4% 


Alternatively,  for  the  high  cost  group  there  is  only  one  characteristic  that  meets 
the  criteria  for  inclusion  in  the  multivariate  regression.  With  p-values  of  0.083,  the  flight 
nurse  career  field  characteristic  is  slightly  above  the  value  of  0.05  for  statistical 
significance.  Table  5  shows  the  regression  equation  and  r-squared  values  for  this 
equation.  With  an  adjusted  r-squared  value  of  0.009,  this  model  is  not  predictive  of  costs, 
therefore  there  may  be  other  characteristics  not  included  in  this  dataset  that  explain  the 
variability  in  patient  appointment  costs  for  the  low  cost  group. 
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Table  5:  High  Cost  Group  Multivariate  Regression  Table 


The  regression  equation  is 

Patient_Appointment_Costs 

=  15124  +  9354*Flight  Nurse 

Predictor 

Coef 

SE  Coef  T 

P 

Constant 

15123.8 

721.9  20.95 

0 

Fiight  Nurse 

9354 

5366  1.74 

0.083 

S  =  10634.1 

R-Sq  =  1.4% 

R-Sq(adj)  =  0.9% 

Logistic  Regression 

Logistic  Regression  is  performed  to  identify  whieh  eharacteristics  can  be  used  to 
determine  whether  a  patient  will  be  in  the  high  cost  group  or  low  cost  group.  The 
continuous  characteristics,  height,  weight,  abdominal  cireumference,  and  physieal  fitness 
run,  and  total  seore  are  used  as  they  provide  the  most  beneficial  information  and 
statistically  significant  p-values.  The  results  are  tested  iteratively  using  binary  logistic 
regression  and  outliers  are  removed  by  observing  delta  ehi-square  values.  Table  6  shows 
the  results  of  the  logistic  regression.  As  shown,  all  p-values  are  below  0.05  for  all 
oharaeteristies.  All  goodness  of  fit  tests  pass  because  all  p-values  are  greater  than  0.05. 
The  odds  ratios  show  that  physieal  fitness  test  score  has  the  strongest  influenee  over 
whether  a  patient  will  end  up  in  the  high  cost  group;  for  each  test  value  point  increase  the 
odds  that  the  patient  ends  up  in  the  high  eost  group  increases  by  15%.  This  is  eounter- 
intuitive  because  members  that  are  more  physically  fit  are  expected  to  require  less 
medieal  attention  and  therefore  cost  less.  It  is  important  to  note  that  these  results  are  only 
for  the  subpopulation,  and  are  not  indicative  of  all  Air  Foree  patients.  A  potential 
explanation  of  these  results  are  members  who  perform  better  on  the  physieal  fitness  test 
are  more  likely  to  engage  in  strenuous  activity  and  therefore  have  potential  to  incur 
higher  eosts  for  MSI  diagnoses.  Additionally,  for  height,  physieal  fitness  run  score,  and 
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abdominal  circumference  have  the  strongest  influence  over  whether  a  patient  will  end  up 
in  the  low  cost  group;  as  these  values  inerease  by  value  of  one,  the  odds  that  the  patient 
ends  up  in  the  high  cost  group  decreases  by  15%,  12%,  and  16%  respectively.  The  result 
that  inereased  abdominal  eircumference  decreases  the  likelihood  a  patient  will  be  in  the 
high  cost  group  is  also  unexpected.  It  is  also  important  to  note  that  abdominal 
circumference  is  not  normalized  for  either  height  or  gender,  and  thus  require  further 
inspection  beyond  inereased  size.  With  an  odds  ratio  of  1.04,  which  is  close  to  1,  weight 
minimally  affects  the  likelihood  a  patient  will  end  up  in  the  high  cost  group. 


Table  6:  Logistic  Regression  Table 


Logistic  Regression  Table 


Predictor 

Coef 

SE  Coef 

Z 

P 

Odds  Ratio 

95%  Lower 

Cl  Upper 

Constant 

2.46992 

4.8301 

0.51 

0.609 

Height 

-0.159321 

0.07152 

-2.23 

0.026 

0.85 

0.74 

0.98 

Weight 

0.0399098 

0.011501 

3.47 

0.001 

1.04 

1.02 

1.06 

Physical  Fitness  Test  Run  Score 

-0.125073 

0.037619 

-3.32 

0.001 

0.88 

0.82 

0.95 

Physical  Fitness  Test  Score 

0.136253 

0.044221 

3.08 

0.002 

1.15 

1.05 

1.25 

Abdominal  Circumference 

-0.169311 

0.083556 

-2.03 

0.043 

0.84 

0.72 

0.99 

Log-Likelihood  =  -124.698 

G  =  22.603  DF  =  5  P-Value  =  0.000 


Goodness-of-Fit  Tests 


Method 

Chi-Square 

DF 

P 

Pearson 

214.153 

224 

0.67 

Deviance 

249.396 

224 

0.117 

Hosmer-Lemeshow 

13.623 

8 

0.092 

Continuity  of  Care  and  Healthcare  Costs 

Continuity  of  care  is  defined  as  the  percentage  of  times  a  patient  meets  with  the 


same  healthcare  provider.  Simple  linear  regressions  are  performed  for  eaeh  MSI 


diagnosis  against  continuity  of  care  as  well  as  for  all  diagnoses  as  a  whole.  Simple  linear 
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regression  is  ealculated  using  two  seenarios:  best  case  and  worst  case.  The  best  case 
scenario  assumes  that  appointments  where  the  providers  IDs  are  missing  from  the 
database  are  all  the  same  provider.  The  worst  case  scenario  assumes  that  appointments 
where  the  providers  IDs  are  missing  from  the  database  are  all  different  providers.  The 
true  value  is  estimated  to  fall  between  these  two  extremes. 

Best  Case  Scenario 

Table  7  displays  the  results  of  each  linear  regression  for  each  combination  of 
diagnosis  and  cost  group.  Highlighted  in  blue  are  the  cases  in  which  there  p-values  are  < 
0.05.  For  the  high  cost  group,  the  p-value  is  <  0.05  for  only  the  dorsopathy  diagnosis 
whereas  the  low  cost  group  has  cases  with  p-values  <  0.05  for  all  diagnoses  with  the 
exception  of  osteopathies.  When  all  patients  care  combined  into  a  single  group,  p-values 
are  <  0.05  for  all  diagnosis  types.  Although  p-values  for  each  case  highlighted  in  blue  are 
below  0.05,  values  for  each  of  these  equations  are  very  low.  Thus,  no  true  conclusions 
can  be  drawn  about  the  true  impact  continuity  of  care  has  on  patient  appointment  costs. 
Figure  3  shows  the  linear  regression  graphs  for  the  statistically  significant  cases.  Figure 
4  and  Figure  5  show  both  the  residual  versus  fits  plots  and  normal  plots  for  residuals  for 
all  cases  in  which  p-values  are  greater  than  or  equal  to  0.05.  With  the  exception  of 
arthropathy  diagnoses  for  all  patients,  in  all  other  residual  versus  fit  plots  there  is  a 
pattern  that  shows  as  continuity  of  care  increases,  the  variability  in  patient  appointment 
costs  also  increases.  The  small  variability  at  the  lowest  levels  of  continuity  of  care  has 
significant  influence  over  the  created  regression  lines  with  small  p-values  and  small  R 
values.  These  violate  the  assumption  that  there  is  constant  variance  along  the  regression 
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line.  Additionally,  the  normal  plots  for  residuals  clearly  show  that  in  all  cases,  the 
residuals  are  not  normal  about  the  linear  regression  equation.  This  violates  the  second 
regression  assumption  of  normality.  Thus  these  regression  equations  are  not  good 
models  to  determine  the  impact  continuity  of  care  has  on  patient  appointment  costs. 


Table  7:  Linear  Regression  Results  Best  Case  Scenario 


Best  Case  Scenario 

High  Cost  Group 

Low  Cost  Group 

All  Patients 

p  =  0.407;  R''2  =  0.009 

p  =  0.019;  R"2  =  0.0122 

p  =  0.00;  8^2  =  0.0534 

Arthropathies 

y  =  -358.75x  + 7774.4 

y  =  -659.92X  +  2193.2 

V  =  985.14x  + 2553.4 

p  =  0.007;  R"2  =  0.0756 

p  =  0.00;  R"2  =  0.1378 

p  =  0.00;  8^2  =  0.0634 

Dorsopathies 

y  =  -8749.6X  +  9739.7 

y  =  -2187.5X  +  3054.5 

y  =  -5070.8X  +  5878.9 

p  =  0.126;  R''2  =  0.0106 

p  =  0.003;  R"2  =  0.0593 

p  =  0.016;  R"2  =  0.0168 

Rheumatism 

y  =  -3897.1x  + 6964.6 

y  =  -984.87X  +  1604.5 

y  =  -3316.4X  +  4684.7 

p  =  0.19;  R''2  =  0.0182 

p  =  0.13;  R''2  =  0.00 

p  =  0.013;  R"2  =  0.0336 

Osteopathies 

y  =  -2031.7x  +  4318 

y  =  -30.78x  + 847.96 

V  =  -2164.5X  +  3485.3 

p  =  0.499;  R''2  =  0.0021 

p  =  0.00;  R"2  =  0.0632 

p  =  0.00;  8^2  =  0.0175 

All  Diagnoses 

y  =  -3387.3x  + 14054 

y  =  -2477.1X  +  4012.3 

V  =  -6452X  +  9503.3 
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Figure  4:  Best  Case  Scenario  Residual  versus  Fit  Plots  -  Patient  Appointment  Costs 
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Normal  Probability  Plot 
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Figure  5:  Best  Case  Scenario  Normal  Plots  for  Residuals  -  Patient  Appointment  Costs 
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Worst  Case  Scenario 

Table  8  displays  the  results  for  eaeh  combination  of  diagnosis  and  cost  group  for 
the  worst  case  scenario.  Cases  in  which  p-values  are  <  0.05  are  highlighted  in  blue. 
Similar  to  the  best  case  scenario,  the  only  case  with  a  p-value  <  0.05  for  the  high  cost 
group  is  for  dorsopathy  diagnoses  while  the  low  cost  group  has  cases  with  p-values  < 

0.05  for  all  diagnoses  with  the  exception  of  osteopathies.  With  both  cost  groups 
combined  into  one  group,  results  reflect  p-values  are  <  0.05  for  all  cases.  Although  p- 
values  for  each  case  highlighted  in  blue  <  0.05,  R  values  for  each  of  these  equations  are 
very  low.  Thus,  no  true  conclusions  can  be  drawn  about  the  true  impact  continuity  of 
care  has  on  patient  appointment  costs.  Figure  6  shows  the  linear  regression  graphs  of  the 
cases  which  are  statistically  significant.  Figure  7  and  Figure  8  display  residual  versus  fit 
plots  and  the  normal  plots  of  residuals  for  the  cases  in  which  p-values  were  less  than  or 
equal  to  0.05.  Similar  to  that  of  the  best  case  scenario,  in  all  of  the  residual  versus  fit 
plots,  there  is  a  pattern  that  shows  as  continuity  of  care  increases,  the  variability  in  patient 
appointment  costs  also  increases.  These  violate  the  assumption  that  there  is  constant 
variance  along  the  regression  line.  Additionally,  the  normal  plots  for  residuals  clearly 
show  that  in  all  cases,  the  residuals  are  not  normal  about  the  linear  regression  equation. 
This  violates  the  second  regression  assumption  of  normality.  Thus  these  regression 
equations  are  not  good  models  to  determine  the  impact  continuity  of  care  has  on  patient 
appointment  costs  in  the  worst  case  scenario. 
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Table  8:  Linear  Regression  Results  Worst  Case  Scenario 


Worst  Case  Scenario 

High  Cost  Group 

Low  Cost  Group 

All  Patients 

p  =  0.656;  R''2  =  0.00026 

p  =  0.00;  R"2  =  0.1565 

p  =  0.017;  R"2  =  0.0159 

Arthropathies 

y  =  -325.65x  +  8031 

y  =  -2232.5X  +  2733.2 

y  =  -909.45X  +  2800.5 

p  =  0.001;  R''2  =  0.066 

p  =  0.00;  R"2  =  0.1235 

p  =  0.00;  8^2  =  0.0732 

Dorsopathies 

V  =  -7211x  + 8391.4 

y  =  -1896.4X  +  2780.9 

y  =  -4932.5X  +  5485.1 

p  =  0.093;  R''2  =  0.0174 

p  =  0.00;  R"2  =  0.1058 

p  =  0.003;  R"2  =  0.0262 

Rheumatism 

y  =  -4184X  +  6733.6 

y  =  -1290.8X  +  1761.7 

y  =  -3739X  +  4666.8 

p  =  0.224;  R''2  =  0.0157 

p  =  0.083;  R''2  =  0.035 

p  =  0.02;  8^2  =  0.0296 

Osteopathies 

y  =  -1797.5x  + 4138.4 

y  =  -463.62x  + 1168.3 

y  =  -1910.5X  +  3275.4 

p  =  0.056;  R''2  =  0.0166 

p  =  0.00;  R"2  =  0.1662 

p  =  0.00;  8^2  =  0.1028 

All  Diagnoses 

y  =  -11531x  +  17883 

y  =  -3833.7X  +  4330.7 

y  =  -18546X  +  14350 
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Continuity  of  Care  vs.  Patient  Appointment  Costs  for 
Arthropathy  Diagnoses  (Lovi/  Cost  &oup) 

p  =  0.00 


Continuity  of  Care  vs.  Patient  Appointment  Costs  for 
Dorsopathy  Diagnoses  (Low  Cost  Group) 
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Figure  6:  Worst  Case  Scenario  Regression  Graphs 
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Figure  7:  Worst  Case  Scenario  Residual  versus  Fits  Plots  -  Patient  Appointment  Costs 
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Nonnal  Probability  Plot 
(rMpons*  K  Appt.  Costs  OorsopottMos  -  HC) 


Normal  Prc^tability  Plot 


Normal  Probability  Plot 
(rwponss  •  Appt  Con  AithropoMS  -  LC) 


Normal  Probability  Plot 


Figure  8:  Worst  Case  Scenario  Normal  Plot  of  Residuals  -  Patient  Appointment  Costs 
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Continuity  of  Care  and  Patient  Availability 

Patient  availability  is  defined  as  the  number  of  days  in  a  ealendar  year  that  a 
patient  is  on  flying  status  and  available  to  fly.  Linear  regression  is  performed  on  patient 
availability  against  eontinuity  of  eare  to  determine  if  eontinuity  of  eare  infiuenees  patient 
availability.  This  is  eompleted  for  both  the  best  ease  and  worst  ease  seenarios.  The 
linear  regression  is  performed  for  eaeh  year  2009  through  2012  eumulatively.  For 
example,  2011  will  inelude  patient  availability  ealeulations  using  data  for  2009,  2010, 
and  2011.  Figures  display  graphs  in  which  p-values  are  less  than  or  equal  to  0.05. 

Best  Case  Scenario 

Table  9  shows  the  results  of  the  regression  analysis  that  examines  continuity  of 
care  against  patient  availability.  Cases  in  which  p-values  are  <  0.05  are  highlighted  in 
blue.  For  the  low  cost  group,  2010  is  the  only  year  in  which  p-value  close  to  0.05;  with 
p-value  of  0.053,  its  close  proximity  to  the  threshold  of  0.05  allows  it  to  be  highlighted 
for  this  study.  For  the  high  cost  group,  cases  in  which  p-values  are  <  0.05  are  in  years 
2010  and  2011.  The  graphs  in  Figure  9  show  steeper  linear  regression  lines  for  the  high 
cost  than  in  the  low  cost  group.  Though  the  p-values  are  <  0.05,  R  values  are  still 
relatively  low,  thus  no  true  conclusions  can  be  drawn  from  the  relationship  continuity  of 
care  has  on  patient  availability.  Figure  10  and  Figure  1 1  show  the  residual  versus  fit 
plots  and  normal  plots  of  residuals  for  the  cases  where  p-values  are  less  than  or  equal  to 
0.05.  In  the  residual  versus  fits  plots,  it  appears  that  there  is  constant  variance  in  all 
cases;  therefore  there  is  no  violation  of  the  constant  variance  assumption  of  regression. 

In  the  normal  plots  of  residuals,  the  normality  assumption  appears  to  be  violated  for  the 
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low  cost  group,  but  not  the  high  cost  group.  Thus  given  the  normality  assumption  is 
violated  for  the  low  cost  group,  this  indicates  that  this  regression  equation  is  not  a  good 
model  to  determine  the  impact  continuity  of  care  has  on  patient  availability  for  the  low 
cost  group.  Although  there  are  no  true  violations  of  the  regression  assumptions  for  the 
high  cost  group,  the  low  values  still  results  in  no  practical  significance  between 
continuity  of  care  and  patient  availability. 


Table  9:  Patient  Availability  Regression  Results  Best  Case  Scenario 


Best  Case  Scenario 

High  Cost  Group 

Low  Cost  Group 

p  =  0.51;  R''2  =  0.0031 

p  =  0.216;  R''2  =  0.0077 

2009 

y  =  27.604X  +  271.47 

y  =  44.791x  + 255.94 

p  =  0.029;  R''2  =  0.0304 

p  =  0.053;  R''2  =  0.0103 

2010 

y  =  160.16X  + 375.62 

y  =  91.272X  +  470.48 

p  =  0.033;  R''2  =  0.0201 

p  =  0.337;  R''2  =  0.0021 

2011 

y  =  229.45x  + 533.52 

y  =  64.669X  +  739.27 

p  =  0.227;  R''2  =  0.0052 

p  =  0.36;  R''2  =  0.0018 

2012 

y  =  162.43X  +  843.95 

y  =  72.585X+  1059.4 
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Figure  9:  Best  Case  Scenario  Continuity  of  Care  vs.  Patient  Availability  Graph 

Versus  Fits  Versus  Fits 

(response  is  2010  HC  Patient  Availability)  (response  is  2010  LC  Patient  Availability) 


Versus  Fits 

(response  Is  2011  HC  Pabent  Availability) 


Figure  10:  Best  Case  Scenario  Residual  Plots  -  Patient  Availability 
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Normal  Probability  Plot 

(response  is  2010  HC  Patient  Availability) 


Normal  Probability  Plot 
(response  is  2011  HC  Patient  Availability) 


Normal  Probability  Plot 

(response  is  2010  LC  Patient  Availability) 


Residual 


Figure  11:  Best  Case  Scenario  Normal  Plots  of  Residuals  -  Patient  Availability 


Worst  Case  Scenario 

Table  10  shows  the  results  of  the  regression  analysis  that  examines  eontinuity  of 
eare  against  patient  availability.  Cases  in  which  p-values  are  <  0.05  are  highlighted  in 
blue.  For  the  high  cost  group,  the  cases  with  p-values  <  0.05  are  in  years  2010  and  2011. 
Though  the  p-values  are  <  0.05,  R  values  are  still  relatively  low,  thus  no  true  conclusions 
can  be  drawn  from  the  relationship  continuity  of  care  has  on  patient  availability.  Figure 
12  shows  the  regression  graphs  for  all  cases  in  which  p-values  are  less  than  or  equal  to 
0.05.  Figure  13  and  Figure  14  show  the  residual  versus  fits  plots  and  the  normal  plots  of 
residual  for  the  years  in  which  p-values  were  less  than  or  equal  to  0.05.  Similar  to  that  of 
the  best  case  scenario,  the  residual  versus  fits  plots  for  the  cases  in  which  p-values  are 
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less  than  or  equal  to  0.05  show  a  constant  variance  along  the  regression  line.  This  proves 
no  violation  of  the  regression  assumption  of  constant  variance.  Also,  in  the  normal  plots 
of  residuals,  there  is  no  clear  violation  of  normality.  Thus  these  are  good  models  to 
determine  the  impact  continuity  of  care  has  on  patient  availability  in  the  worst  case 
scenario.  However,  the  low  values  still  question  the  strength  of  the  relationship 
between  continuity  of  care  and  patient  availability. 


Table  10:  Patient  Availability  Regression  Results  Worst  Case  Scenario 


Worst  Case  Scenario 

High  Cost  Group 

Low  Cost  Group 

p  =  0.291;  R''2  =  0.0081 

p  =  0.896;  R''2  =  0.00 

2009 

y  =  36.032X  +  269.56 

y  =  4.1548x  + 280.66 

p  =  0.01;  R''2  =  0.0328 

p  =  0.578;  R''2  =  0.00 

2010 

y  =  138.34x  + 403.55 

y  =  23.101x  + 512.07 

p  =  0.004;  R''2  =  0.0381 

p  =  0.924;  R''2  =  0.00 

2011 

y  =  256.05X  +  554.29 

y  =  5.4858X  +  774.3 

p  =  0.115;  R''2  =  0.00114 

p  =  0.627;  R''2  =  0.00 

2012 

y=  199.15X  +  862.86 

y  =  32.699X  +  1084.7 
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Figure  12:  Worst  Case  Scenario  Patient  Availability  vs.  Continuity  of  Care  Graph 
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Versus  Fits 

(response  is  2010  HC  Patient  Availability) 


Versus  Fits 

(response  is  2011  HC  Patient  Availability) 


Figure  13:  Worst  Case  Scenario  Residual  versus  Fits  Plots  -  Patient  Availability 
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Normal  Probability  Plot 

(response  is  2010  HC  Patient  Availability) 


Normal  Probability  Plot 

(response  is  2011  HC  Patient  Availability) 


Figure  14:  Worst  Case  Scenario  Normal  Plots  of  Residuals  -  Patient  Availability 


Conclusion 

This  chapter  analyzes  the  eharaeteristies  that  make  up  different  healtheare  eost 
groups  within  the  Air  Foree  flier  eommunity.  Multivariate  and  simple  linear  regression  is 
used  to  make  this  determination.  The  results  are  unable  to  eonelude  that  any  of  the 
eharaeteristies  ehosen  for  this  study  are  predictive  of  costs.  Continuity  of  care  is  also 
analyzed  to  see  how  it  impaets  healtheare  eost  and  patient  availability.  The  analysis 
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shows  continuity  of  care  explains  very  little  of  the  variability  observed  in  patient 
appointment  costs  and  patient  availability.  Further  analysis  should  be  performed  on  a 
broader  population  to  validate  the  generalizability  of  this  eonelusion. 
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V.  Conclusions  and  Recommendations 


Chapter  Overview 

Multiple  data  mining  techniques  are  utilized  in  this  study  to  determine  if 
continuity  of  care  impacts  healthcare  costs  and  patient  availability;  different  cost  groups 
and  defining  characteristics  are  also  identified  in  this  study.  The  results  provide  insights 
into  the  current  Air  Force  healthcare  system  and  can  be  used  to  improve  upon  the  current 
model. 

Investigative  Questions 

Investigative  Question  1:  Are  there  different  cost  profiles  that  make  up  this 
population? 

This  question  is  answered  by  calculating  the  highest  cost  patients  and  the 
percentage  of  the  total  cost  they  contribute.  This  is  done  by  sorting  the  population  in 
order  by  patient  total  costs  and  determining  the  percentage  of  the  subpopulation  total 
costs  the  highest  group  accounts  for.  Using  this  method,  the  top  30%  of  patients  are 
chosen  as  the  high  cost  group  given  they  account  for  70%  of  the  subpopulation  total 
costs.  This  follows  the  hypothesized  Pareto  Rule  that  a  minority  percentage  of  the 
population  is  responsible  for  a  majority  percentage  of  healthcare  costs.  This  adds  benefit 
to  Air  Force  healthcare  researchers  in  that  the  identification  of  the  high  cost  group 
enables  research  to  be  scoped  to  target  this  specific  group  while  still  targeting  a  majority 
of  healthcare  costs. 

Investigative  Question  2:  What  are  the  defining  characteristics  of  the  different 
cost  populations? 
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Low  Cost  Group 

Analysis  of  variance  and  multivariate  linear  regression  are  performed  on  eaeh 
oharaeteristie  against  patient  appointment  eosts  in  order  to  determine  whether  there  are 
eharaeteristies  that  are  predietive  of  eost.  Based  on  the  analysis  of  varianee  test,  there  are 
no  statistieally  signifieant  eosts  differenees  between  the  different  eharaeteristies  ehosen 
for  this  study.  Multivariate  linear  regression  show  that  other/unknown  raee  and  the  pilot 
eareer  field  have  p-values  less  than  0.05  in  eaeh  analysis  teehnique;  therefore  these 
eharaeteristies  are  assumed  to  be  predietive  of  patient  appointment  eost  for  the  low  eost 
group.  Multivariate  regression  shows  that  if  a  patient’s  raee  is  Other/Unknown,  their 
mean  appointment  eosts  are  expeeted  to  be  $824.40  lower.  Additionally,  if  a  patient  is  in 
the  pilot  eareer  field,  the  mean  appointment  eosts  are  expeeted  to  be  $450  lower. 

Adjusted  values  of  2.4%  indieate  these  eharaeteristies  aeeount  for  a  small  portion  of 
the  influenee  of  patient  appointment  eosts. 

High  Cost  Group 

There  are  no  statistieally  signifieant  results  that  show  that  any  personal  or 
organizational  faetors  influenee  patient  appointment  eosts  for  the  high  eost  group. 
Determining  Which  Cost  Group  Patient  Belongs  To 

Binary  logistie  regression  is  performed  to  determine  whieh  eharaeteristies  prediet 
whether  a  patient  is  in  the  high  eost  group.  Height,  weight,  fitness  test  run  seore,  fitness 
test  seore,  and  abdominal  eireumferenee  provide  the  best  predietion,  with  odds  ratios  of 
0.85,  1.04,  0.88,  1.15,  and  0.84  respeetively.  The  odds  ratio  is  ratio  of  the  probability  of 
an  event  to  the  probability  of  a  non  event.  This  is  interpreted  as  for  eaeh  unit  inerease, 
the  odds  of  a  patient  being  in  the  high  eost  groups  inereases  by  the  odds  ratio.  This 
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information  can  be  helpful  in  predieting  the  likelihood  a  patient  is  to  end  up  in  the  high 
eost  group. 

Investigative  Question  3:  How  does  continuity  of  care  impact  healthcare  costs? 
Does  the  impact  differ  for  high  V5.  low  cost  groups? 

Continuity  of  eare  is  ealeulated  using  the  percentage  of  appointments  in  which  a 
patient  meets  with  the  same  provider.  Simple  linear  regression  is  used  to  determine  the 
relationship  between  continuity  of  care  and  healtheare  eosts.  Due  to  low  values  and 
violation  of  regression  assumptions,  it  is  concluded  that  continuity  of  care  explains  very 
little  of  the  variability  observed  in  patient  appointment  eosts. 

Investigative  Question  4:  How  does  continuity  of  care  impact  patient  availability? 

Simple  linear  regression  is  used  to  determine  the  impaet  eontinuity  of  eare  has  on 
patient  availability.  Due  to  low  R  values  and  violation  of  regression  assumptions,  it  is 
eoneluded  that  eontinuity  of  eare  explains  very  little  of  the  variability  observed  in  patient 
availability. 

Significance  of  Research 

The  sponsor  for  this  research  is  the  711*  Human  Performance  Wing  (HPW)  at 
Wright  Patterson  Air  Foree  Base,  OH.  The  Air  Foree’s  healtheare  model  eurrently 
eonsists  of  primary  care  managers  (PCMs)  and  PCM  teams.  Every  patient  is  assigned  a 
PCM  and  thus  subsequently  a  PCM  team.  Currently,  poliey  states  that  it  is  the  goal  that 
each  patient  meets  with  their  primary  eare  manager  (PCM)  for  70%  of  their 
appointments,  or  with  a  member  of  their  PCM  team  for  90%  of  their  appointments.  The 
research  suggests  that  there  are  no  measureable  benefits  to  eost  or  patient  availability 
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with  increased  continuity  of  care.  Knowing  this  information  is  beneficial  to  the  Air  Force 
beeause  it  can  be  used  to  redefine  the  continuity  of  care  goals  and  help  prioritize  other 
important  aspects  of  healthcare.  It  is  important  to  note,  this  study  does  not  investigate  the 
other  benefits  assoeiated  with  continuity  of  care  such  as  improved  quality  of  care, 
deereased  emergency  room  visits,  and  decreased  number  of  appointments  needed. 

Before  ruling  out  the  need  for  continuity  of  care,  it  may  be  important  to  explore  these 
other  measures  to  determine  if  increased  continuity  of  care  adds  value  to  these  areas 
within  the  Air  Force’s  healtheare  system. 

Recommendations  for  Future  Research 

The  next  step  for  furthering  researeh  on  continuity  of  care  within  the  Air  Force  is 
to  expand  the  research  beyond  the  subpopulation  chosen  for  this  study.  This  research 
investigates  a  sub  population  of  Air  Force  active  duty  fliers  that  were  off  of  flying  status 
due  to  an  MSI  in  July  of  2009;  expanding  this  subpopulation  to  include  other  non-MSI 
diagnoses  can  provide  insight  into  whether  the  findings  presented  herein  are  specific  to 
MSIs  only  or  if  the  influence  continuity  of  care  has  on  healtheare  costs  and  patient 
availability  are  similar  for  other  diagnoses.  Furthermore,  while  none  of  the  personal  or 
organizational  characteristics  investigated  in  this  study  were  found  to  influence  patient 
appointment  eosts,  exploring  other  charaeteristics  that  may  be  better  predietors  of  costs 
could  provide  beneficial  insights  on  drivers  of  increased  healthcare  costs. 

Summary 

This  chapter  examines  eaeh  investigative  question  as  stated  in  the  overview 
chapter,  and  the  conclusions  that  are  drawn  based  on  the  results  of  the  analysis.  Next  the 
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chapter  covers  how  these  conclusions  are  significant  to  the  sponsor  and  the 
recommendations  that  can  be  made  for  further  researeh. 


59 


VI,  Appendix 


Appendix  A:  IRB  Approval  Letters 


Data  Mining  Simnlation  &  Optimization  of  Healthcare  Information  to  Determine 
Influences  on  Healthcare  Costs  &  Patient  Outcomes 
FWR20140109E 


1.  Principal  Investigator 

Capt  Christina  Rusnock.  USAF/AFIT,  785-3636x4611.  cliristina.rusnock'@afit.edu 

2.  -Associate  Investigators 

Lt  Col  Anthony  Tvaryanas.  71  IHPW.HP,  798-3253,  anthonv.tvarvanas'Sus  af  mil 
Capt  David  Wade.  USAF  AFIT,  785-3636,  david. wadel?  afit.edti 

3.  Faciliti- 

Secondary  data  analyses  will  be  conducted  at  the  Air  Force  Institute  of  Technology  at 
Wright  Patterson  Air  Force  Base,  Ohio.  Data  will  be  stored  on  AFIT  servers  with 
restricted  access  to  only  principal  investigator  and  associate  investigators  hsted  m 
sections  1  and  2. 

4.  Objective 

This  research  seeks  to  identify  mdicators  and  charactenstics  of  Air  Force  personnel  that 
contribute  to  excessive  healthcare  costs.  In  order  to  perform  this  research,  intensive  data 
mining  of  the  health  services  data  will  be  required  The  results  of  this  research  can  assist 
m  identifymg  leadmg  mdicators  of  high  healthcare  costs  and  processes  that  contribute  to 
excessive  costs.  Potential  benefits  mclude  recommendations  on  developing  manpower 
requirements  per  career  field  based  on  medical  history,  managing  the  stafiing  levels  of 
healthcare  professionals,  and  other  implementing  process  improvements  to  the  current 
system  in  order  to  reduce  long-term  Air  Force  healthcaie  costs. 

5.  Background 

The  cost  of  the  Air  Force  healthcare  system  has  been  growing  at  a  rapid  pace  [1].  It  is 
hypothesized  that  the  cost  is  not  evenly  distributed  amongst  the  entire  Air  Force 
population,  but  instead  there  exist  a  large  percentage  of  costs  that  stems  from  only  a  small 
portion  of  the  population  [2]. 

Our  goal  is  to  implement  formal  data  mining  techniques  to  identify'  this  high-cost  sub- 
population.  deteimme  the  characteristics  of  this  population,  identify  predictors  of  this 
population,  and  ultimately  optimize  healthcare  process  based  on  this  population. 
Leveraging  the  knowledge  of  the  subject  matter  expert.  Lt  Col  Anthony  Tvaryanas.  we 
will  be  perfonmng  data  mining  techniques,  includmg.  but  not  liimted  to  logistic 
regression  and  multivanate  linear  regression.  Based  on  findings  fiom  the  data  mining, 
simulation  will  then  be  used  to  capture  the  variabihty  and  emergent  system  behavior  and 
system  dynamics.  The  information  from  both  the  data  mining  and  simulation  will  be 
used  to  build  parameters  and  constraints  to  formulate  an  optimization  problem. 

Our  secondary  goal  is  to  conduct  a  process  improvement  study  on  a  mihtary  clime 
located  at  Wright  Patterson  AFB  implementing  formal  discrete-event  simulation 
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•  Is  there  a  small  portion  of  the  AF  population  that  drives  healthcare  costs  or  patient 
availabihty?  ( in  a  given  year,  cumulative?) 

•  What  was  the  lost  duty  time  (for  special  duty  personnel)  associated  with  disease 
conditions? 

•  What  disease  conditions  are  coirelated  within  individuals? 

•  How  are  disease  conditions  and  health  services  utilization  correlated  within 
mdividuals  and  their  associated  beneficiaries? 

•  Are  there  umque  characteristics  that  make  up  this  sub  population  (AFSC, 
deployment  status,  fitness,  geo  location,  age)? 

•  What  characteristics  of  patient  care  (continuity,  provider  type)  result  m  lower  cost 
or  inci  eased  availabihty  for  this  sub  population? 

•  How  does  the  cost  benefit  of  continuity  of  care  change  in  chronically  ill  patients 
versus  patients  with  acute  illnesses? 

•  Does  continmty  of  care  impact  healthcare  costs?  Does  the  inqiact  differ  for  high 
cost  vs.  low  cost  populations? 

•  Does  continmty  of  care  impact  patient  av'ailabilit)'?  Does  the  impact  differ  for 
high  vs.  low  cost  populations? 

•  Does  high  cost  diagnostic  test  increase  availabihtj'?  Does  higher  cost  treatments 
mcrease  availabihty? 


The  research  project  wiU  also  utilize  centralized  medical  databases  maintained  by 
AFi'SGb  to  conduct  a  simulation  study  on  the  chnics  located  at  Wright  Patterson  AFB. 
Ms.  Genny  Maupin  will  be  the  3"*  party  data  broker  with  AF ’SG6  and  will  be  responsible 
for  de-identitying  die  data  prior  to  transferrmg  it  to  the  Pis.  The  objective  of  this 
simulation  is  to  identify  the  baseline  process  of  the  current  system,  identify  where  the 
bottlenecks  occur,  and  identity'  the  relative  cost-effectiveness  of  current  and  alternative 
persoimel  staffing  levels  and  processes  as  well  as  mitigate  the  patient  wait  time.  Data  will 
be  obtained  from  the  AFPC  Peisonnel  database.  Defense  Enrollment  Ehgibihty 
Reportmg  System  (DEERS),  the  Aviation  Safety  Information  Management  System 
(ASIMS)'.  the  Air  Force  Fitness  Management  System  (AFFMS).  the  Cardiac  Risk 
Assessment  and  Management  (CRAM)  database,  the  Aeromedical  Information 
Management  Waiver  Trackmg  System  (AIMWTS)  and  the  \Iihtary  Health  System 
(MHS)  data  Mart  (M2),  which  contains  on-base  outpatient  clinic  visits  for  the  entire  Air 
Force.  Data  collected  will  mclude  demographics  (i.e.  age  and  gender),  encounter  dates, 
duty  not  including  flying  information  (DNIF)  codes,  fitness  restrictions,  physical  fitness 
test  scores,  cardiac  risk  scores,  waivers  for  flyers,  duty  location  and  Air  Force  Specialty 
Code  (AFSC)  and  procedure  and  diagnosis  codes  to  assess  the  utilization  of  serv'ices  and 
investigate  the  associated  disease  conditions.  Data  collected  for  the  simulation  study 
will  include  the  patient  category  (i.e.  military,  civihan  and  dependent),  appomtment  type, 
diagnosis  codes,  appomtment  status,  clinics  located  at  Wright  Patterson  AFB  and 
provider  type  to  access  the  probabihties  of  a  particular  type  of  patient  visiting  the  clinic. 
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techniques  to  identify  opportunities  the  mihtary  healthcare  system  can  inqilement  to 
advance  the  efficiency  and  effectiveness  of  our  military  healthcare  system.  The  discrete- 
event  simulation  will  map  the  process  of  a  base  level  health  clinic  and  identify  the 
relative  cost-effectiveness  of  current  and  alternative  personnel  staffing  let'els  and 
processes.  The  results  of  our  findmg  will  aid  the  clinic  to  minimize  staffing  cost,  identify 
bottlenecks  in  the  system,  minimize  patient  wait  time,  maximize  the  utilization  of  its 
medical  personnel,  and  ultimately  increase  the  efficiency  and  effectiveness  the  military 
healthcare  system. 


6.  Impact  to  Air  Force  Mission 

Given  that  the  DoD  health  care  costs  are  growing  more  than  twice  as  fast  as  economy¬ 
wide  medical  inflation,  there  is  reason  for  serious  concern  that  increasing  health  care 
expenditures  will  reduce  resource  availability'  for  other  important  defense  programs  and 
undermine  the  overall  capability  of  the  U.S.  military.  Consequently,  there  is  an  urgent 
need  to  bring  health  care  costs  into  a  sustainable  range,  and  all  aspects  of  the  defense 
health  portfolio  must  be  subject  to  critical  review — and  aerospace  medicine  can  be  no 
exception. 

7,  Experimental  Plan 

a.  Equipment: 

Existmg  computers  that  have  Arena.  MiniTab.  IMP,  and  Microsoft  Office  2007  software 
installed  will  be  used.  These  computers  require  CAC  enabled  access. 

b.  Subjects: 

Active  duty  Air  Force  special  duty  personnel  and  their  beneficiaries.  Additional  analyses 
focused  on  Wright  Patterson  AF6  population  utilizing  the  clinics  at  Wri^t  Patterson 
AFB. 

c.  Duration: 

Timefiame  for  data  analysis  is  June  2014  to  June  2016;  timeframe  for  competition  of  the 
study  is  approximately  24  months. 

d.  Description  of  experiment,  data  collectioiL  and  anahsis: 

The  research  project  will  utilize  centralized  medical  databases  maintained  by  AF.’SG6  to 
perform  a  cross-sectional  audit  over  a  10-year  period  (CY03-CY13)  at  overall  Air  Force 
healthcare  system.  As  such,  all  appropnate  data  use  agreements  will  be  obtained  prior  to 
acqumng  data.  Data  will  be  pulled  by  Ms.  Geimy  Maupm:  personal  identifiers  will  be 
stripped  before  data  is  forwarded  for  analysis  will  be  performed.  The  objective  of  this 
audit  is  to  find  the  characteristics  that  define  the  population  of  active  duty  Air  Force 
members  and  their  dependents  that  make  up  the  hipest  percentage  of  Air  Force 
healthcare  costs.  In  addition,  this  audit  will  look  to  find  the  differences  in  recovery  time 
for  persotu  with  musculoskeletal  injury  that  had  the  highest  continuity  with  the  health 
care  provider.  Specific  information  w'ill  be  elicited  on  the  following  questions: 
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The  study  will  also  collect  time  data  (i.e.  the  tune  it  takes  for  a  process  to  be  completed) 
at  the  clmic  to  develop  probabiht)'  drstnbuUons  for  the  simulation. 

Regression  analysis  will  be  performed  to  determine  which  illnesses  or  characteristics  of 
patients  make  up  the  highest  percentage  of  healthcare  cost  annually,  and  how  these  have 
changed  over  the  years.  To  tailor  our  research,  it  will  be  necessary  to  have  available  as 
much  information  as  possible  to  determine  which  characteristics  correlate  with  costs. 

Since  PIE  and  PHI  will  be  collected  from  over  a  ten  year  period  and  involve  up  to  100.000 
participants,  it  is  unreahstic  to  obtain  informed  consent  for  the  PH  and  HIPAA 
authorization  for  the  PHI.  Hence,  both  a  waiver  of  informed  consent  and  a  waiver  of 
HIPAA  authorization  are  needed. 

Accordmgly,  a  waiver  of  informed  consent  is  here  forth  requested.  Per  32  CFR 
219.1 16(d).  "an  IRB  may  approve  a  consent  procedure  which  does  not  include,  or  which 
alters,  some  or  all  of  the  elements  of  infoimed  consent  set  forth  in  this  section,  or  waive 
the  requirements  to  obtain  infoimed  consent  provided  the  IRB  finds  and  documents  that; 

1.  The  research  involves  no  more  than  minimal  risk  to  the  subjects; 
u.  The  waiver  or  alteration  will  not  adversely  affect  the  nghts  and  welfare  of 
the  subjects; 

iii.  The  research  could  not  practicably  be  carried  out  without  the  waiver  or 
alteration;  and 

iv.  Whenever  appropnate.  the  subjects  will  be  provided  with  additional 
pertment  information  after  participation.” 

e.  In  the  case  of  this  protocol,  the  research  involves  no  more  than  minimal  risk  to  the 
subjects  as  indentifymg  information  will  only  be  used  to  link  data  from  disparate 
information  sources  and  will  then  be  removed  from  the  dataset.  Additionally,  the  research 
could  not  be  practically  earned  out  as  informed  consent  from  each  research  subject  is  not 
possible  due  to  the  difficulty  in  locating  each  subject  given  the  time  frame  of  interest  (i.e.. 
10-year  period),  the  size  of  the  sample  (which  will  include  several  thousand  subjects), 
and  the  short  time  frame  over  which  the  study  will  be  conducted. 

f.  Safety  momtoring; 

Not  applicable,  as  study  is  minimal  risk. 

g.  r  onfidentialitv  protection; 

Computers  used  for  data  management  will  be  located  at  the  Air  Force  Institute  of 
Technology  at  Wright  Patterson  AFB.  OH  m  building  640.  The  PI  and  AI's  computers 
require  appropriate  access.  i.e.  Common  Access  Card  (CAC).  In  addition,  all  personal 
identifiers  will  be  stripped  prior  to  analysis,  once  data  are  merged,  and  random  numbers 
will  be  assigned  to  each  individual  m  the  dataset.  No  key  or  code  will  be  kept  linking  the 
random  numbers  to  the  Personally  Identifiable  Information  (PH)  or  Protected  Health 
Information  (PHI).  Data  will  not  be  analyzed  or  investigated  until  the  identifiers  have 
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been  stripped.  The  associate  investigators  have  all  completed  CITI  training  and  possess 
active  Secret  or  Top  Secret  security  clearances.  Data  will  be  deleted  once  the  study  is 
complete  (approximately  24  months). 


9.  Risk  Analysis 

The  main  risk  to  subjects  is  potential  release  of  PH  and  PHI.  This  nsk  will  be  mitigated 
by  the  procedures  in  8f.  Another  risk  to  subjects  is  the  release  of  findmgs  that  could 
potentially  shed  a  negative  Ught  on  the  career  fields  studied.  All  reports  and 
presentations  w'ill  be  routed  through  the  appropriate  Public  AfiEairs  (PA)  and  Scientific 
and  Technical  Information  (STTNFO)  channels  prior  to  release  outside  the  organization. 
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DEPARTMENT  OF  THE  AIR  FORCE 

AIR  FORCE  RESEARCH  LABORATORY 
WRK3HT-PATTERSON  AIR  FORCE  BASE  OHIO  45433 


MEMORANDUM  FOR  USAF/AFTT  (CAPT  CHRISTINA  RUSNOCK) 

FROM:  711HPW.IR(AFRLIRB) 

SUBJECT;  IRB  appro\’al  for  the  use  of  human  volunteers  m  research 

1.  Protocol  title:  Data  Mining  Simulation  &  Optimization  ofHealthcare  Information  to 
Determme  Influences  on  Healthcare  Costs  &  Patient  Outcomes 

2.  Protocol  number:  F\^^0140109E 

3.  Protocol  version:  1.01 

4.  Risk:  N/A 

5.  Approval  date:  4  September  2014 

6.  Expiration  date:  N/A 

7.  Scheduled  renew'al  date:  N/A 

8.  Type  of  review:  Exempt 

9.  Assurance  Number  and  Expiration  Date:  N/A 

10.  cm  Training:  Completed 

1 1 .  The  above  protocol  has  been  reviewed  and  determined  to  be  exempt  from  IRB  oversight. 
The  objective  of  the  study  is  to  identify  indicators  and  characteristics  of  Air  Force 
personnel  that  contribute  to  excessive  healthcare  costs,  in  order  to  have  data  upon  which 
to  provide  reccnnmendations  about  manpower  requirements,  staffing  levels  and  odier 
he^thcare  system  improvements  that  will  ultunately  reduce  the  cost  of  providing 
healthcare.  Up  to  100,000  subject  health  records  are  expected  to  be  included  in  the 
retrospective  healdi  care  record  data  mining  effort.  Access  to  PHI  data  bases  will  be 
provi^  by  SG6  and  proper  data  use  agreements  will  be  in  place  The  data  will  be 
collected  by  a  disinterested  third  party  (Ms.  Gen  Maupin)  who  will  provide  a  fully  de- 
idenlified  database  to  the  Principle  Investigator  for  analysis.  No  id^ifiable  data  will  be 
accessed  by  or  recorded  by  the  researchers.  Amendments:  Changes  to  Section  7  part  d 
which  include  a  change  to  the  focus  area  of  continuity  of  healdicare  providers  with 
persons  with  musculoskeletal  injury,  three  additional  research  questions  on  unpact  of 
continuity  of  care  on  cost  and  patient  av'ailability,  four  new  databases  that  data  will  be 
collected  from,  six  new  data  fields  that  will  be  obtained  and  analyzed.  This  protocol 
therefore  meets  the  criteria  for  exenq)tion  in  accordance  widi  32  CFR  219.101  (bX4) 
which  exempts  “Research,  involving  the  collection  or  study  of  existing  data,  documents. 
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records,  pathological  specimens,  or  diagnostic  specimens,  if  these  sources  are  pubhcly 
available  or  if  the  information  is  recorded  by  the  investigator  in  such  a  manner  that 
subjects  cannot  be  identified,  directly  or  through  identifiers  linked  to  the  subjects.” 

12.  mPAA  authorization  is  required  to  access  PHI  from  AHLTA  for  these  research  purposes. 
HffAA  wai\'er  is  granted  ha\frig  found  this  to  be  miniiual  risk  study,  wherem  the  study 
could  not  be  conducted  without  access  to  the  PHI.  consent  could  not  practicably  be 
obtained.  PHI  accessed  is  limited  to  the  minimal  number  of  records  as  is  needed  to  meet 
the  research  goal  and  adequate  privacy  security  safeguards  are  in  place. 

13.  FDA  regulations  do  not  apply  since  no  drugs,  supplements,  or  unapproved  medical 
de\ices  will  be  used  in  this  research. 


14.  This  exemption  apphes  only  to  the  requirements  of  32  CFR  219,  DoDI  3216.02.  AFI 40- 
402.  and  related  human  research  subject  regulations. 

15.  With  this  approval  comes  the  expectation  that  the  Principle  hivestigator  has  the  funding 
to  fully  execute  the  protocol.  Partial  protocol  fiindmg.  particularly  with  Greater  than 
Minimal  Risk  studies,  should  pronqit  a  re-examination  of  die  protocol  by  both  the 
Principle  Investigator  and  the  IRB  with  specific  enqihasis  on  ttie  risk-bmefit  ev'aluation. 

16.  Any  serious  adv'erse  event  or  issues  resulting  from  this  study  should  be  reported 
immediately  to  the  IRB.  Amendments  to  protocols  and  or  revisions  to  informed  consent 
documents  must  haw  IRB  approval  prior  to  inqilementation.  Please  retain  both  hard 
copy  and  electronic  copy  of  the  final  approved  protocol  and  informed  consent  document. 

1 7.  The  IRB  must  be  notified  if  there  is  any  change  to  the  design  or  procedures  of  the 
research  to  be  conducted.  Otherwise,  no  ftirther  action  is  required.  All  inquiries  and 
correspondence  concemmg  this  protocol  should  include  the  protocol  number  and  name  of 
the  primary  investigator. 


18. 


For  questions  or  concerns,  please  contact  the  IRB  administrator,  Lt  Eric  Fergueson  at 
willi^.fergueson(®tis.af.mil  or  (937)  904-8094.  All  inquiries  and  correspondence 
concemmg  this  protocol  should  include  the  protocol  number  and  name  of  the  primaiy' 
mvestigator. 


LONDON.KIM.ELI 
ZABETH.n  55556 
370 


Oignaly  signod  by 

LOMXKKlMEltZABfTH '  ]S5iS637C 
ON.  C-4JS,  0-tLS  Govamment  ou-OoO, 
ou-na.ou-U5AF. 

cn-lOMDGNaM.aiZAKTH1 1  &SSS6370 
Data  ?014X>9.0<11;2ft194>«00 


KIM  E.  LONDON.  ID,  MPH.  CIP 
Giair,  AFRL  IRB 
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1st  Indorsement.  USAF  AFTT  (CAPT  CHRISTINA  RUSNOCK).  Memo.  4  September  2014, 
IRB  Approval  for  the  Use  of  Humans  m  Research.  Expedited  Review,  Exenqjt  Approval. 
Protocol  Number  FWT120140109E 

MEMORANDUM  FOR  71 1  HPW.  IR  (KIM  LONDON) 

I  have  reviewed  the  hardcopy  and  electronic  records  and  found  diem  to  be  conq)lete  and 
accurate. 

FERGUEsoN.wiLLiA 

M.ERIC.1 29651 3071  n -fna.noN  wnuMUiK-i  7XS1  >071 

IMK  7Dt4i»M 

W.  ERIC  FERGUESON.  2LT.  USAF 
Lead  Administrator,  AFRL  IRB 
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Appendix  B:  Best  Case  Scenario  for  Continuity  of  Care  vs.  Patient  Appointment 


Costs  Graphs  for  p-values  >  0,05 
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Figure  15:  Best  Case  Scenario  Regression  Graphs  (Cases  p  >  0,05) 
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Appendix  C:  Worst  Case  Scenario  for  Continuity  of  Care  vs.  Patient  Appointment 


Costs  Graphs  for  p-values  >  0,05 
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Figure  16:  Worst  Case  Scenario  Regression  Graphs  (Cases  p  >  0,05) 
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Appendix  D:  Best  Case  Scenario  Continuity  of  Care  vs.  Patient  Availability  Graphs 


for  p-values  >  0,05 
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Figure  17:  Best  Case  Scenario  Patient  Availability  vs.  Continuity  of  Care  Graph 

(Cases  p  >  0,05) 
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Appendix  E:  Worst  Case  Scenario  Continuity  of  Care  vs.  Patient  Availability 
Graphs  for  p-values  >  0,05 
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Figure  18:  Worst  Case  Scenario  Patient  Availability  vs.  Continuity  of  Care  Graph 

(Cases  p  >  0,05) 
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